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Abstract 


In  this  thesis,  we  study  two  classes  of  problems:  routing  and  classification.  Rout¬ 
ing  problems  include  those  that  concern  the  tradeoff  between  routing  table  size  and 
short-path  forwarding  (Part  I),  and  the  classic  Edge  Disjoint  Paths  problem  (Part 
II).  Both  have  applications  in  communication  networks,  especially  in  overlay  net¬ 
work,  and  in  large  and  high-speed  networks,  such  as  optical  networks.  The  third 
part  of  this  thesis  concerns  a  type  of  classification  problem  that  is  motivated  by  a 
computational  biology  problem,  where  it  is  desirable  that  a  small  amount  of  geno¬ 
type  data  from  each  individual  is  sufficient  to  classify  individuals  according  to  their 
populations  of  origin. 

In  hierarchical  routing,  we  obtain  “near-optimal”  routing  table  size  and  path 
stretch  through  a  randomized  hierarchical  decomposition  scheme  in  the  metric 
space  induced  by  a  graph.  We  say  that  a  metric  {X^d)  has  doubling  dimension 
dim(X)  at  most  a  if  every  set  of  diameter  D  can  be  covered  by  2“  sets  of  di¬ 
ameter  D/2.  (A  doubling  metric  is  one  whose  doubling  dimension  dim(A)  is  a 
constant.)  For  a  connected  graph  G,  whose  shortest  path  distances  do  induce  the 
doubling  metric  {X^dc),  we  show  how  to  perform  (1  -|-x)-stretch  routing  on  G  for 
any  0  <  X  <  1  with  routing  tables  of  size  at  most  (oc/x)  log  Alog  6  bits  with  only 
(a/x)'^^”)  log  A  entries,  where  A  is  the  diameter  of  G  and  6  is  the  maximum  degree 
of  G.  Hence,  the  number  of  routing  table  entries  is  just  x^'^^^HogA  for  doubling 
metrics. 

The  Edge  Disjoint  Paths  (EDP)  problem  in  undirected  graphs  refers  to  the  fol¬ 
lowing:  Given  a  graph  G  with  n  nodes  and  a  set  T  of  pairs  of  terminals,  connect 
as  many  terminal  pairs  as  possible  using  paths  that  are  mutually  edge  disjoint. 
This  leads  to  a  variety  of  classic  NP-complete  problems,  for  which  approximabil- 
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ity  is  not  well  understood.  We  show  a  polylogarithmie  approximation  algorithm 
for  the  undireeted  EDP  problem  in  general  graphs  with  a  moderate  restrietion  on 
graph  eonneetivity:  we  require  the  global  minimum  eut  of  G  to  be  Q(log^  n).  Previ¬ 
ously,  eonstant  or  polylogarithmie  approximation  algorithms  were  known  for  trees 
with  parallel  edges,  expanders,  grids  and  grid-like  graphs,  and,  most  reeently,  even- 
degree  planar  graphs.  These  graphs  either  have  speeial  strueture  (e.g.,  they  exelude 
minors)  or  there  are  large  numbers  of  short  disjoint  paths.  Our  algorithm  extends 
previous  teehniques  in  that  it  applies  to  graphs  with  high  diameters  and  asymptoti¬ 
cally  large  minors. 

In  the  classification  problem,  we  are  given  a  set  of  2N  diploid  individuals  from 
population  Pi  and  P2  (with  no  admixture),  and  a  small  amount  of  multilocus  geno¬ 
type  data  from  the  same  set  of  K  loci  for  all  2N  individuals,  and  we  aim  to  partition 
Pi  and  P2  perfectly.  Each  population  Pa,  where  a  G  {1,2},  is  characterized  by  a 
set  of  allele  frequencies  at  each  locus.  In  our  model,  given  the  population  of  origin 
of  each  individual,  the  genotypes  are  assumed  to  be  generated  by  drawing  alleles 
independently  at  random  across  the  K  loci,  each  from  its  own  distribution.  Eor  ex¬ 
ample,  each  SNP  (or  Single  Nucleotide  Polymorphism)  has  two  alleles,  which  we 
denote  with  bit  1  and  bit  0  respectively.  In  addition,  each  locus  contains  two  bits 
(one  from  each  parent)  that  are  assumed  to  be  two  random  draws  from  the  same 
Bernoulli  distribution. 

We  use  p\  and  p^,  yk  =  1,. . .  ,K  to  denote  frequency  of  an  allele  mapping  to  bit 
1  at  locus  k  in  P\  and  P2,  respectively.  We  use  y  =  as  the  dissimilarity 

measure  between  Pi  and  P2.  We  compute  the  number  of  loci  K  that  we  need  to 
perform  different  tasks,  versus  N  and  y,  and  prove  several  theorems.  Ultimately, 
we  show  that  with  probability  1  —  l/poly(N),  given  that  K  =  and 

K  —  we  can  recognize  the  perfect  partition  (^1,^2)  from  among  all  other 

balanced  partitions  of  the  2N  individuals.  We  proved  this  theorem  for  two  cases: 
either  we  are  given  two  random  draws  for  each  attribute  along  each  dimension,  or 
only  one. 
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1  Introduction 


This  thesis  concerns  three  problems:  hierarchical  routing,  edge-disjoint  paths,  and 
classification. 

1.1  Hierarchical  Routing  and  Hierarchical  Decompositions 

In  seminal  work  by  Kleinrock  and  Kamoun  [1977],  a  hierarchical  routing  scheme 
based  on  an  “optimal”  hierarchical  clustering  model  of  nodes  in  the  network  is 
described.  They  further  show  that  for  a  class  of  large  distributed  networks,  by  fol¬ 
lowing  their  routing  scheme,  it  is  possible  to  achieve  a  substantial  reduction  in 
routing  table  size  with  essentially  no  increase  in  the  average  path  length,  over  all 
source-destination  pairs  in  the  network. 

Essentially,  the  family  of  networks  upon  which  it  is  possible  to  apply  such  an 
“optimal”  hierarchical  clustering  scheme  satisfies  cerfain  growfh  properfies  such 
fhaf:  (a)  fhe  diamefer  of  any  clusfer  S  of  nodes  chosen  is  bounded  above  by  C?(|5'|'') 
for  some  consfanf  v  G  [0,1],  and  (b)  fhe  average  disfance  befween  nodes  in  fhe 
nefwork  is  0(77'’),  where  N  is  fhe  size  of  fhe  nefwork. 

While  some  recenf  papers  by  Plaxfon  el  al.  [1999];  Karger  and  Ruhl  [2002]; 
Hildrum  el  al.  [2002]  on  dislribuled  objecl  localion  in  peer-lo-peer  nelworks  used 
definitions  and  reslricfions  lhal  differ  slighlly  from  each  olher,  fhe  essential  Iheme 
was  lo  reduce  fhe  “inlrinsic  complexify”  of  each  problem  in  ils  own  conlexl  by 
bounding  fhe  growfh  rale  of  nelworks,  as  done  by  Kleinrock  and  Kamoun. 

We  design  fhe  piece  lhal  is  missing  from  Kleinrock  and  Kamoun  [1977]:  a 
hierarchical  decomposition  algorilhm.  We  furlher  improve  Iheir  resulls  by  giving 
bounds  on  palh  slrelch  on  a  per  node-pair  level  using  slighlly  differenl  assumptions 
on  Ihe  nelwork  growlh.  Specifically,  we  caplure  Ihe  nelwork  growlh  and  parameler- 
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ize  the  inherent  “eomplexity”  of  a  metric  space  {X,d)  generated  by  such  a  network 
using  its  doubling  dimension  dim(X)  :  the  least  value  a  such  that  each  ball  of  radius 
R  can  be  covered  by  at  most  2“  balls  of  radius  R/2. 

We  show  the  following  result. 

Theorem  1,1,  Given  any  network  G,  whose  shortest  path  distances  do  induce  the 
doubling  metric  (X^dc)  with  dim(X)  =  a,  and  any  X  >  0,  there  is  a  routing  scheme 
on  G  that  achieves  (1  -\-x) -stretch,  where  each  node  stores  only  ( ^)‘^(“HogAlog5 
bits  of  routing  information,  where  A  is  the  diameter  of  G  and  5  is  the  maximum 
degree  of  G. 

Note  that  for  any  a  G  Z,  the  space  under  any  of  the  ip  norms  has  doubling 
dimension  0(a),  and  hence  this  doubling  dimension  extends  the  standard  notion 
of  geometric  dimension.  This  also  allows  us  to  conclude  that,  in  order  to  obtain  a 
near-optimal  routing  scheme  in  terms  of  path  stretch  and  routing  table  size,  all  we 
need  is  a  simple  restriction  on  how  fast  the  network  grows. 

1 .2  Edge-Disjoint  Paths  in  Moderately  Connected  Graphs 

In  the  second  part  of  this  thesis,  we  first  explore  approximation  for  the  edge  disjoint 
paths  (EDP)  problem:  Given  a  graph  with  n  nodes  and  a  set  of  terminal  pairs,  con¬ 
nect  as  many  of  the  specified  pairs  as  possible  using  paths  that  are  mutually  edge 
disjoint.  EDP  has  a  multitude  of  applications  in  areas  such  as  VESI  design,  routing 
and  admission  control  in  large-scale,  high-speed  and  optical  networks.  Moreover, 
EDP  and  its  variants  have  also  been  prominent  topics  in  combinatorics  and  theo¬ 
retical  computer  science  for  decades.  Eor  example,  the  celebrated  theory  of  graph 
minors  by  Robertson  and  Seymour  [1990]  gives  a  polynomial  time  algorithm  for 
routing  all  the  pairs  given  a  constant  number  of  pairs.  However,  varying  the  num¬ 
ber  of  terminal  pairs  leads  to  a  variety  of  classic  NP-complete  problems,  for  which 
approximability  is  an  interesting  problem.  In  a  recent  breakthrough,  Andrews  and 
Zhang  [2005b]  showed  an  D(log  3“^n)  lower  bound  on  the  hardness  of  approxima¬ 
tion  for  undirected  EDP. 

In  this  work,  we  show  a  polylogarithmic  approximation  algorithm  for  the  undi¬ 
rected  EDP  problem  in  general  graphs  with  a  moderate  restriction  on  graph  con- 
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nectivity:  we  require  that  there  are  n(log^?i)  edge  disjoint  paths  between  every 
pair  of  vertices,  i.e.,  the  global  min  cut  is  of  size  fl(log^  n).  If  this  moderately  con¬ 
nected  case  holds,  we  can  route  fl(OPT /polylog  n)  pairs  using  disjoint  paths  with 
congestion  1,  where  OPT  is  the  maximum  number  of  pairs  that  one  can  route  edge 
disjointly  for  the  given  EDP  instance.  Previously,  constant  or  polylogarithmic  ap¬ 
proximation  algorithms  were  known  for  trees  with  parallel  edges,  expanders,  grids 
and  grid-like  graphs,  and,  most  recently,  even-degree  planar  graphs  by  Kleinberg 
[2005].  The  results  rely  either  on  excluding  a  minor  (or  other  structural  proper¬ 
ties)  or  the  fact  that  many  very  short  paths  exist.  Our  algorithm  extends  previous 
techniques;  for  example,  our  graphs  can  have  high  diameter  and  contain  very  large 
minors.  We  are  hopeful  that  this  constraint  on  the  global  minimum  cut  can  be  re¬ 
moved  if  congestion  on  each  edge  is  allowed  to  be  O(loglogn).  Formally,  we  have 
the  following  result. 

Theorem  1.2.  There  is  a  polylog  n-approximation  algorithm  for  the  edge  disjoint 
paths  problem  in  a  general  graph  (f  with  minimum  cut  and  node  degree  O(log^  n). 

1 .3  Population  Classification 

In  the  third  part  of  this  thesis,  we  explore  a  type  of  classification  problems  in  the 
context  of  a  computational  biology  problem.  In  particular,  we  aim  to  classify  indi¬ 
viduals  according  to  their  populations  of  origin,  based  on  only  a  small  amount  of 
their  genotype  data. 

In  seminal  work  by  Pritchard  et  al.  [2000],  two  types  of  clustering  methods 
are  described  for  using  multilocus  genotype  data  to  infer  population  structure  and 
assign  individuals  to  populations. 

(1)  Distance-based  Methods.  These  proceed  by  calculating  a  pairwise  distance 
matrix,  whose  entries  give  the  distance  between  every  pair  of  individuals. 
This  matrix  may  then  be  represented  using  some  convenient  graphical  rep¬ 
resentation  (such  as  a  tree  or  a  multidimensional  scaling  plot)  and  clusters 
may  be  identified  by  eye. 

(2)  Model-based  Methods.  These  proceed  by  assuming  that  observations  from 
each  cluster  are  random  draws  from  some  parametric  model.  Inference  for 
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the  parameters  eorresponding  to  eaeh  eluster  is  done  jointly  with  inferenee 
for  the  eluster  membership  of  eaeh  individual,  using  standard  statistieal 
methods  (for  example,  maximum-likelihood  or  Bayesian  methods). 

A  model-based  elustering  method  is  used  by  Pritehard  et  al.  [2000].  They  as¬ 
sume  a  model  in  whieh  there  are  K  populations,  where  K  may  be  unknown,  and 
eaeh  population  is  eharaeterized  by  a  set  of  allele  frequeneies  at  eaeh  loeus. 

While  we  follow  essentially  the  same  model,  assuming  no  admixture,  we  fix 
K  =  2.  We  name  our  method  graph-based',  in  some  sense,  it  is  similar  to  Distance- 
based  Methods  in  that  we  assign  a  seore  to  every  pair  of  individuals  that  eapture 
the  degree  of  dissimilarity  between  them;  the  true  novelty  of  our  approaeh,  how¬ 
ever,  is  that  we  eonstruet  a  eomplete  graph  while  assigning  seores  to  edges,  sueh 
that  in  expeetation,  a  balaneed  eut  with  the  maximum  seore,  whieh  we  denote  as 
the  max-cut  of  the  eomplete  graph,  will  provide  us  the  perfeet  partition  -  i.e.,  the 
perfeet  partition  indeed  has  the  maximum  seore  among  all  balaneed  partitions  in 
the  eomplete  graph,  given  a  balaneed  input  instanee. 

Our  goal  is  to  minimize  the  number  of  loei  that  we  require  in  order  to  elassify 
the  two  populations,  given  a  set  of  2N  diploid  individuals  from  two  populations 
Pi  and  P2  and  their  genotypes  from  the  same  set  of  K  loei.  Reeall  that  for  diploid 
organisms  the  ehromosomes  eome  in  pairs.  A  genotype  is  a  list  of  unordered  pairs 
of  alleles,  sueh  that  one  eomes  from  eaeh  of  the  parents. 

Sinee  eaeh  Single  Nueleotide  Polymorphism  (SNP)  has  two  variants  (alleles), 
we  use  bit  1  and  bit  0  to  denote  them.  Given  the  population  of  origin  of  eaeh 
individual,  the  genotypes  are  assumed  to  be  generated  by  drawing  alleles  indepen¬ 
dently  from  the  appropriate  population  frequeney  distribution.  We  use  p\  and  p\ 
to  denote  the  “sueeess”  probability  (frequeney  of  an  allele  mapping  to  bit  1)  at  lo¬ 
eus  k  in  the  population  of  origin  1  and  2  respeetively.  Eaeh  loeus  eontains  two  bits 
that  are  assumed  to  be  two  random  draws  from  the  same  Bernoulli  distribution. 
We  use  Y  =  id  measure  that  we  optimize  the  number  of  loei  we 

need  against.  We  show  three  results  whose  proof  ranges  from  straight-forward  to 
sophistieated. 
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1.3.1  Quartet-based  Scores 

In  all  three  theorems,  we  assign  the  same  seore  to  a  pair  of  individuals  X^Y,  whieh 
measures  the  differenee  between  the  two  individuals,  henee  a  higher  seore  is  more 
desirable  for  two  individuals  from  different  populations  of  origin.  Using  this  seore, 
we  ean  eonstruet  a  eomplete  graph  where  nodes  are  individuals  and  edge  weight  is 
the  seore  between  the  two  individuals.  Note  that  when  we  say  the  seore  for  a  eut, 
we  mean  the  sum  of  seores  on  all  edges  aeross  the  eut.  In  partieular,  we  eall  this 
seore  Pscore(X,y)  for  an  unordered  pair  of  individuals  (^,1^): 

Definition  1.1. 

Pscore{X ,  1")  =  ^  Pscore'  (i'f ,  1")  =  ^  Pscore'  J  f 

i=i  i=i  [  y\  3^2 

where 


Pscore‘{X,Y) 


(4'  =x<2  +  4i  =>i )  (^4  =y\  +  ^4=4  ^ 

(4’  =4  +  4i  =4  ^  ~  ^“^4  =4  ^4=ri  ^ 


where  =  1  if  x  =  y,  and  =  0  otherwise. 

Note  that  this  definition  utilizes  an  important  quartet  eonstruetion  involving 
four  bits  x\,X2,y\,y2,  whieh  are  four  independent  Bernoulli  random  variables,  sueh 
that  two  bits  from  eaeh  pair  {x\,X2),  identieally  distributed. 

The  first  theorem  says  that,  given  enough  loei,  all  seores  are  correct  in  the 
following  sense. 


Theorem  1.3.  (Global  Optimum  Lemma)  Let  IN  be  the  total  number  of  indi¬ 
viduals.  Given  that  K  >  18  InN /y^,  with  probability  1  —  0(1  /N^),  for  all  quartets 
X,Y,Zi,Z2  such  that  X,Y  come  from  different  populations,  while  Zi,Z2  come  from 
the  same  population, 


Pscore{X,Y)  >  Pscore{Zi,Z2). 


This  immediately  implies  that,  given  a  balaneed  input  instanee  where  we  have 
the  same  number  of  individuals  from  eaeh  population,  the  max-cut  (i.e.,  the  eut 
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with  the  maximum  score)  separates  the  two  populations  perfectly.  This  gives  us 
the  trivial  algorithm  for  separating  a  balanced  input  instance  into  Pi  and  P2  and 
assigning  individuals  correctly:  simply  keep  the  top  edges  in  terms  of  Pscore 
in  the  complete  graph,  and  these  edges  correspond  to  a  max-cut  that  separates  Pi 
from  P2  perfectly. 

For  imbalanced  input  instances,  the  perfect  partition  (Fi  ,F’2)  has  the  maximum 
average  score,  where  average  score  for  a  cut  is  defined  as  the  total  score  across 
edges  in  the  cut  divided  by  the  number  of  such  edges. 

In  addition,  for  imbalanced  instance,  we  only  need  to  adjust  the  constant  in  the 
bound  for  K,  so  that  all  edges  between  individuals  from  different  populations  are 
above  a  certain  threshold  h  while  all  other  edges  are  blow  threshold  £<h,  given  that 
the  expected  values  for  Pscore(X,T)  and  Pscore(Zi,Z2)  differ  significantly  from 
one  another:  E[Pscore(Z,T)]  >  2Ky  while  E[Pscore(Zi,Z2)]  =  0.  By  keeping 
only  edges  above  a  certain  threshold  and  by  taking  account  of  deviation,  we  keep 
edges  that  define  a  perfecf  partition.  Hence,  fhis  algorifhm  works  for  bofh  inpuf 
cases. 

We  call  Ibis  fheorem  a  Global  Optimum  Lemma,  since  fhere  exisfs  an  overall 
desirable  ordering  among  all  edge  scores  in  fhe  complete  graph. 

The  second  fheorem  says  fhaf,  given  we  have  some  pre-classified  individuals 
from  Pi  and  P2,  N  from  each  origin,  if  requires  fewer  bifs  from  a  new  individual 
in  order  fo  puf  if  on  fhe  correcf  side,  since  fhe  sum  of  dissimilarify  scores  from 
fhis  new  individual  X  fo  fhe  ofher  population  is  consisfenfly  higher  fhan  fhe  sum  of 
scores  fo  ifs  own  population. 

Theorem  1.4.  (Local  Optimum  Lemma).  Let  K  >  For 

any  X,  w.l.o.g.  from  Pi,  and  its  observed  bit  string  X,  with  probability  1  —  X  —  5, 
given  that  Xi,Yi,yi  are  individuals  randomly  draw  from  Pi  and  P2  respectively,  we 
have 


N  N 

^  Pscore{X,Yi\X  =  1)  >  £  Pscore{X,Xi\X  =  1). 


i—i 


i—i 


A  similar  statement  holds  for  any  Y  from  P2. 
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This  tells  us  that  once  we  have  IN  individuals,  N  from  each  population,  that 
are  already  classified  properly,  a  new  node  can  almost  always  pick  the  correct  side 
to  join  based  on  its  local  view:  it  just  needs  to  join  the  side  that  it  has  a  lower  total 
score  to  the  N  individuals  on  that  side.  Hence,  we  denote  this  theorem  as  a  Local 
Optimum  Lemma. 

Finally,  we  show  a  theorem  says  that  perfect  partition  corresponds  to  the  max- 
cut  in  the  complete  graph,  given  any  balanced  input  instance,  by  requiring  slightly 
more  bits  than  necessary  in  the  Local  Optima  Lemma,  but  still  asymptotically  fewer 
(in  terms  of  y)  than  that  of  Global  Optima  Lemma. 

Theorem  1.5,  Given  that  K  =  0( and KN  =  where  N  >S,  with 

probability  1  —  1/  poly(A/^),  we  can  differentiate  the  perfect  partition  from  all  other 
balanced  partitions  of  individuals. 

1 .3.2  Learning  Mixtures  of  Product  Distributions 

After  exploring  the  power  of  two  random  draws  from  any  one  dimensional  distribu¬ 
tion  in  the  K  dimensional  distributions,  we  ponder  at  the  possibility  of  achieving  the 
same  power  of  clustering  using  a  single  random  draw  from  each  dimension:  given 
a  small  sample,  i.e.,  N  is  small,  can  we  learn  the  partition  with  a  small  amount  of 
attributes,  if  for  each  attribute,  we  are  given  only  a  single  bit  from  its  Bernoulli 
distribution? 

The  answer  is  positive.  We  show  the  following  theorem  using  an  inner  product 
based  score,  which  we  call  Rscore. 

Definition  1.2.  Rscore{X,Y)  =<x,y>= 

Theorem  1.6.  Given  that  K  =  and  KN  =  where  N  >  A, 

with  probability  1  —  1/  poly  (A), /or  all  balanced  cuts  in  the  complete  graph  formed 
among  2N  sample  points,  we  can  differentiate  the  perfect  partition  from  all  other 
balanced  partitions  of  the  sample  by  finding  the  min-cut. 

We  note  that  Hamming  distance  based  score  will  give  similar  claim,  using  max- 
cut.  We  also  note  that  neither  Rscore  nor  Hamming  distance  based  score  will  give 
us  claims  similar  to  Global  or  Local  Optima  Lemmas  as  in  Theorem  1.3  and  1.5. 
However,  for  the  special  case  that  we  know  whether  >  P2  or  vice  versa,  then 
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a  simple  bit-wise  seore  that  is  similar  to  what  we  define  below  suffiees  to  prove  a 
Global  Optimum  Lemma;  in  partieular  suppose  that  we  know  V/,pj  >  p\\ 

Definition  1.3.  For  an  unordered  pair  of  individuals  let  BscorefX^Y)  = 

K 

Bscore{X,Y)  —  ^  Bscore'{X,Y). 

i—i 

We  show  that  the  absolute  value  of  seores  between  points  from  the  same  dis¬ 
tribution  is  eonsistently  below  those  between  points  from  different  distributions  in 
Theorem  9.2. 

1.3.3  Related  Work 

There  are  two  streams  of  related  work.  The  first  stream  is  the  reeent  progress  in 
learning  from  the  point  of  view  of  elustering,  where  given  samples  drawn  from  a 
mixture  of  well-separated  Gaussians  (eomponent  distributions),  one  aims  to  elas- 
sify  the  sample  aeeording  to  whieh  eomponent  distribution  it  eomes  from,  as  stud¬ 
ied  in  Dasgupta  [1999];  Dasgupta  and  Sehulman  [2000];  Arora  and  Kannan  [2001]; 
Vempala  and  Wang  [2002] ;  Aehlioptas  and  MeSherry  [2005] ;  Kannan  et  al.  [2005] ; 
Dasgupta  et  al.  [2005].  Under  this  framework,  it  has  also  been  extended  to  more 
general  distributions  sueh  as  log-eoneave  distributions  in  Aehlioptas  and  MeSherry 
[2005];  Kannan  et  al.  [2005]  and  heavy -tail  distributions  in  Dasgupta  et  al.  [2005]. 

These  work  mostly  foeus  on  redueing  the  requirement  on  the  suffieient  separa¬ 
tion  eonditions  between  any  two  eenters  P\  and  P2  in  the  mixture  from  dependenee 
on  K,  the  dimensions,  to  dependenee  only  on  the  number  of  eomponents  in  the 
mixture,  in  order  to  elassify  most  of  the  sample  eorreetly.  In  eontrast,  we  foeus  on 
the  ease  that  although  we  only  have  a  mixture  of  two  produet  distributions,  the  sam¬ 
ple  size,  i.e.,  number  of  individuals,  is  small;  we  prove  that  by  aequiring  enough 
number  of  attributes  along  the  same  set  of  dimensions  from  both  distributions,  with 
high  probability,  we  ean  eorreetly  elassify  every  node  in  the  sample. 

The  seeond  stream  of  work  is  under  the  Probably  Approximately  Correet  (PAG) 
framework,  where  given  a  sample  generated  from  some  target  distribution  Z,  the 
goal  is  to  output  a  distribution  Zi  that  is  elose  to  Z  aeeording  to  Kullbaek-Leibler 
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divergence:  KL{Z\\Z\),  where  Z  is  a  mixture  of  product  distributions  over  discrete 
domains  or  Gaussians  (Kearns  et  al.  [1994];  Freund  and  Mansour  [1999];  Cryan 
[1999];  Cryan  et  al.  [2002];  Mossel  and  Roch  [2005];  Feldman  et  al.  [2005,  2006]). 
These  work  do  not  require  a  minimal  distance  between  any  two  distributions. 

To  compare  our  results  with  learning  mixtures  of  Gaussians,  we  first  denote  the 
.^2-square  distance  between  the  centers  of  the  two  distributions:  |  |^i  —  ^2 1 12  =  = 

(1)  Theorem  1.3  requires  that  the  distance  between  two  distributions:  HFi  — 

i.e.,  the  separation  requirement  depends  on  the  number  of 
dimensions  of  each  product  distribution.  This  is  comparable  to  that  in  Das- 
gupta  and  Schulman  [2000];  Arora  and  Kannan  [2001]. 

(2)  Theorem  1.5  requires  that  d  —  ||Fi  —  F’2||2  =  fl(ln5A),  where  N  — 

which  is  independent  of  the  dimension  of  the  product  distribu¬ 
tion;  this  is  comparable  to  what  Kannan  et  al.  [2005],  and  Achlioptas  and 
McSherry  [2005]  accomplish  for  the  continuous  case. 

1.4  Thesis  Outline 

Chapter  2  and  3  belong  to  Part  1  (Hierarchical  Routing).  Part  11  (Edge  Disjoint 
Paths)  contains  Chapter  4-6.  Chapters  7-9  belong  to  Part  111  (Classification).  One 
can  safely  skip  Chapter  5  while  still  being  able  to  connect  Chapter  4  with  Chapter  6. 


Routing,  Disjoint  Paths,  and  Ciassification 


Part  I:  Hiearchical  Routing 
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2  Hierarchical  Routing  in  Doubling 
Metrics 


2.1  Introduction 

The  doubling  dimension  of  a  metric  space  is  the  least  value  a  such  that 

each  ball  of  radius  R  can  be  covered  by  at  most  2“  balls  of  radius  R/2  Gupta 
et  al.  [2003].  For  any  a  e  Z,  the  space  under  any  of  the  Ip  norms  has  doubling 
dimension  0(a),  and  hence  this  doubling  dimension  extends  the  standard  notion 
of  geometric  dimension;  moreover,  it  can  be  seen  as  a  way  to  parameterize  the 
inherent  “complexity”  of  metrics. 

In  this  chapter,  we  study  the  problem  of  designing  routing  algorithms  for  net¬ 
works  whose  structure  is  parameterized  by  the  doubling  dimension  dim(2f)  =  a; 
we  show  that  one  can  route  along  paths  with  stretch  ( 1  +  x)  with  small  routing 
tables — with  only  0((a/x)^(“HogA)  entries,  where  A  is  the  diameter  of  the  net¬ 
work  G.  Each  entry  stores  at  most  f?(log5)  bits,  where  5  is  the  maximum  degree 
of  G,  and  hence  for  doubling  metrics — where  a  is  a  constant — and  any  x  <  1,  we 
have  (1  +  x)-stretch  routing  with  only  0(logAlog5)  bits  of  routing  information  at 
each  node. 

The  idea  of  placing  restrictions  on  the  growth  rate  of  networks  to  bound  their 
“intrinsic  complexity”  is  by  no  means  novel;  it  has  been  around  for  a  long  time 
(see,  e.g.,  Kleinrock  and  Kamoun  [1977]),  and  has  recently  been  used  in  several 
contexts  in  the  literature  on  object  location  in  peer-to-peer  networks  Plaxton  et  al. 
[1999];  Karger  and  Ruhl  [2002];  Hildrum  et  al.  [2002].  While  these  papers  used 
definitions  and  restrictions  that  differ  slightly  from  each  other,  we  note  that  our 
results  hold  in  those  models  as  well.  Our  results  extend  those  of  Talwar  [2004], 
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whose  routing  schemes  for  metrics  with  dim(X)  =  a  require  local  routing  infor¬ 
mation  of  «  0(log“  A)  bits.  Formally,  we  have  the  following  main  result. 

Theorem  2,1,  Given  any  network  G,  whose  shortest  path  distances  do  induce 
the  doubling  metric  {X^dc)  with  dim(X)  =  a,  and  any  x  >  0,  there  is  a  rout¬ 
ing  scheme  on  G  that  achieves  (1  -\-x)-stretch  and  where  each  node  stores  only 
(^)^(“)  log  A  log  5  bits  of  routing  information,  where  A  is  the  diameter  of  G  and  5 
is  the  maximum  degree  ofG. 

The  proof  of  the  theorem  proceeds  along  familiar  lines;  we  construct  a  set  of 
hierarchical  decompositions  (HDs)  of  the  metric  {X,d),  where  each  HD  consists  of 
a  set  of  successively  finer  partitions  of  X  with  geometrically  decreasing  diameters. 
Each  node  in  X  maintains  a  table  containing  next  hops  to  a  small  subset  of  clusters 
in  these  partitions;  to  route  a  packet  from  s  to  t,  we  use  the  routing  table  for  s  to 
pick  some  “small  cluster”  C  in  s’  table  that  contains  t  and  send  the  packet  to  some 
node  X  in  C;  a  similar  process  repeats  at  node  x  EC  until  the  packet  reaches  t.  The 
idea  is  to  create  routing  tables  which  ensure  that  the  distance  from  x  to  f  is  much 
smaller  than  that  from  s  to  t,  and  hence  the  detour  taken  in  going  from  5  to  f  is  only 
xd{sf).  (Details  of  routing  schemes  appear  in  Section  2.4  and  3.1.) 

While  this  framework  is  well-known,  the  standard  ways  to  construct  HDs  are 
top-down  methods  which  iteratively  refine  partitions.  These  methods  create  long- 
range  dependencies  which  require  us  to  build  0{\ogn)  HDs  in  general;  in  order  to 
use  the  locality  of  the  doubling  metrics  and  get  away  with  0{a)  HDs,  we  develop 
a  bottom-up  approach  that  avoids  these  dependencies  when  building  HDs.  The 
analysis  of  this  process  uses  the  Lovasz  Local  Lemma  (much  as  in  Krauthgamer 
and  Lee  [2003];  Gupta  et  al.  [2003]);  details  are  given  in  Section  2.3. 

2.1.1  Related  Work 

Distributed  packet  routing  protocols  have  been  widely  studied  in  the  theoretical 
computer  science  community;  see,  e.g.,  Lrederickson  and  Janardan  [1988,  1989]; 
Awerbuch  and  Peleg  [1992];  Peleg  and  Upfal  [1989];  Cowen  [2001];  Peleg  [2000], 
or  the  survey  by  Gavoille  [2001]  on  some  of  the  issues  and  techniques.  Note  that 
these  results,  however,  are  usually  for  general  networks,  or  for  networks  with  some 
topological  structure.  By  placing  restrictions  on  the  doubling  dimension,  we  are 
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able  to  give  results  which  degrade  gracefully  as  the  “complexity”  of  the  metric  in¬ 
creases.  For  example,  it  is  known  that  any  universal  routing  algorithm  with  stretch 
less  than  3  requires  some  node  to  store  at  least  Q.{n)  routing  information  Gavoille 
and  Gengler  [2001];  however,  these  graphs  generate  metrics  with  large  dim(X). 
Our  results  thus  allow  one  to  circumvent  these  lower  bounds  for  metrics  of  “lower 
dimension”. 

Packet  routing  in  low  dimensional  networks  has  been  previously  studied  in  Tal- 
war  [2004],  that  gives  algorithms  that  require  0(a(;^)“(log“‘''^A))  bits  of  infor¬ 
mation  to  be  stored  per  node  in  order  to  achieve  (1  -|-x) -stretch  routing — for  con¬ 
stant  stretch  x  and  doubling  dimension  a.  The  resulting  dependence  of  f?(log^“'““  A) 
should  be  contrasted  with  the  dependence  of  O  (log  A  log  5)  bits  of  information  in 
our  schemes.  We  should  point  out  that  his  algorithms  are  based  on  graph  decom¬ 
position  ideas  with  a  top-down  approach  and  do  not  require  the  LLL  to  construct 
routing  tables. 

One  of  the  papers  that  influence  this  work  is  that  of  Kleinrock  and  Kamoun 
[1977].  They  describe  a  general  hierarchical  clustering  model  on  which  our  routing 
schemes  are  based.  They  show  that  routing  schemes  based  on  a  hierarchical  clus¬ 
tering  model  do  not  cause  much  increase  in  the  average  path  length  for  networks 
that  satisfy  the  following  two  assumptions:  (a)  the  diameter  of  any  cluster  S  cho¬ 
sen  is  bounded  above  by  0(|5'|'')  for  some  constant  v  G  [0, 1],  and  (b)  the  average 
distance  between  nodes  in  the  network  is  ©(«'').  In  contrast,  we  give  bounds  on 
the  path  stretch  on  a  per  node-pair  level  using  slightly  different  assumptions  on  the 
network  geometry. 

Other  papers  on  object  location  in  peer-to-peer  networks  Plaxton  et  al.  [1999]; 
Karger  and  Ruhl  [2002] ;  Hildrum  et  al.  [2002]  have  also  used  restrictions  similar 
to  Kleinrock  and  Kamoun  [1977]  on  the  growth  rate  of  metrics;  in  particular,  they 
consider  metrics  where  increasing  the  radius  of  any  ball  by  a  factor  of  2  causes  the 
number  of  points  in  it  to  increase  by  at  most  some  constant  factor  2^.  (Plaxton  et 
al.  Plaxton  et  al.  [1999]  also  consider  the  lower  bound  on  the  growth.)  Here  the 
parameter  P  can  be  considered  to  be  another  notion  of  “dimension”  for  a  metric 
space.  It  can  be  shown  that  dim(2f)  <  4P  [Gupta  et  al.,  2003,  Prop.  1.2];  hence 
our  results  hold  for  such  metrics  as  well.  Our  scheme  is  also  similar  in  spirit  to  a 
data-tracking  scheme  of  Rajaraman  et  al.  [2001],  who  use  approximations  by  tree 
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distributions  to  obtain  bounds  on  the  stretch  incurred. 

2.2  Definitions  and  Notation 

Let  the  input  metric  be  (X,r/);  this  paper  deals  with  finite  metrics  with  at  least  2 
points.  We  use  standard  terminology  from  the  theory  of  metric  spaces;  many  def¬ 
initions  can  be  found  in  Deza  and  Laurent  [1997]  and  Heinonen  [2001].  Given 
X  e  X  and  r  >  0,  we  let  B(x,  r)  denote  {x'  €  X  |  d{x,x')  <  r},  i.e.,  the  ball  of 
radius  r  around  x.  Given  a  subset  S  C  X,  the  distance  of  x  G  X  to  the  set  S  is 
d{x,S)  —  min{(i(x,x')  |  x'  G  5}. 

The  doubling  constant  Xx  of  a  metric  space  {X,d)  is  the  smallest  value  X  such 
that  every  ball  in  X  can  be  covered  by  X  balls  of  half  the  radius.  The  doubling 
dimension  of  X  is  then  defined  as  dim(X)  =  log2Xx;  we  use  the  letter  a  to  denote 
dim(2f).  A  metric  is  called  doubling  when  its  doubling  dimension  is  a  constant. 
A  subset  T  C  A  is  an  r-net  of  X  if  (1)  for  every  x,y  G  Y,d{x,y)  >  r  and  (2)  X  C 
Uy£yB(y,  r).  Such  nets  always  exist  for  any  r  >  0,  and  can  be  found  using  a  greedy 
algorithm. 

Proposition  2,1  (Gupta  et  al.  [2003]).  If  all  pairwise  distances  in  asetY  QX  are 
at  least  r  (e.g.,  when  Y  is  an  r-net  ofX ),  then  for  any  point  x  G  A  and  radius  t,  we 
have  that  |B(x,i)  nT|  < 

Proof  Applying  the  definition  of  doubling  constant  of  the  input  metric  {X^d), 
B(x,  t )  can  be  covered  by  X  balls  of  radius  t /2  centered  around  some  vertices  inside 
B{x,t).  By  applying  the  same  definition  at  most  [log2  y]  times,  one  get  a  cover  of 
B(x,f)  with  balls  of  radius  <  r/2.  Since  all  pairwise  distances  in  T  C  A 

are  at  least  r,  none  of  y,y^  G  T  can  fall  into  the  same  ball  of  radius  <  r/2;  thus 
each  ball  of  radius  r/2  covers  at  most  1  node  from  Y.  Thus  we  have  |B(x,t)  nT|  < 
xn°g2Ti.  □ 

A  cluster  C  in  the  metric  (X^d)  is  just  a  subset  of  points  of  the  set  A.  The 
diameter  of  the  cluster  C  is  the  largest  distance  between  points  of  the  cluster.  Each 
cluster  is  associated  with  a  center  x  G  A  {which  may  not  lie  in  C)  and  the  radius  of 
the  cluster  C  is  the  smallest  value  r  such  that  the  cluster  C  is  contained  in  B(x,  r). 
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Definition  2.1,  Given  r>0,  an  r-ball  partition  IT  of{X^d)  is  a  partition  ofX  into 
clusters  Ci  ,C2, . . with  each  cluster  Ci  having  a  radius  at  most  r. 

By  scaling,  let  us  assume  that  the  smallest  inter-point  distance  inX  is  exactly  1. 
Let  A  denote  the  diameter  of  the  metric  {X^d),  and  hence  A  is  also  the  aspect 
ratio  of  the  metric.  Define  p  =  256a -|-  1  and  h  =  logpA  .  Let  us  define  rj,-  = 
l-|-p-|-p^-|-...-|-p'<  p'+V(P “  that r|;  =  prii-i  +  1-  Let  us  fix  a  p72-net 

and  denote  with  Ni  for  the  metric  (X,  d),  for  every  0  <i  <h  +  \. 

2.2.1  Hierarchical  Decompositions  (HDs) 

We  now  give  a  formal  definition  of  a  hierarchical  decomposition  (HD)  which  is 
used  throughout  this  paper  and  is  the  basic  object  of  our  study.  As  noted  below, 
such  a  decomposition  can  be  naturally  associated  with  a  decomposition  tree  that  is 
used  for  our  hierarchical  routing  schemes. 

Definition  2.2.  A  p -hierarchical  decomposition  11  (p-HD)  of  the  metric  (X^d)  is  a 
sequence  of  partitions  Ho, . . . ,  H/i  with  h  —  logp  A  such  that: 

(1)  The  partition  H/i  has  one  cluster  X,  the  entire  set. 

(2)  (geometrically  decreasing  diameters)  The  partition  H,  is  an  r\i-ball  parti¬ 
tion.  Since  inter-point  distances  are  at  least  1,  it  implies  that  Ho  =  {{x}  :  x  E 
X};  in  other  words,  each  cluster  in  Ho  is  a  singleton  vertex. 

(3)  (hierarchical)  H,  is  a  refinement  o/LIi+i  and  each  cluster  in  IT,-  is  contained 
within  some  cluster  o/Hj+i. 

Given  such  a  p-HD  II  =  (n,)^^Q,  the  partition  H,  is  called  the  level-i  partition 
of  n  and  clusters  in  H,  are  the  level-i  clusters.  Note  that  these  clusters  have  a 
radius  rj;  and  hence  diameter  <  2ri;.  Furthermore,  define  the  degree  deg(n)  to  be 
the  maximum  number  of  level-/  clusters  contained  in  any  level-(/  +  1)  cluster  in 
for  all  0  <  /  <  /i  —  1. 

Hierarchical  Decompositions  and  HSTs,  A  hierarchical  decomposition  is  a  lami¬ 
nar  family  of  sets,  where  given  any  two  sets,  they  are  either  disjoint  or  one  contains 
the  other.  It  is  well  known  that  such  a  family  f  of  sets  over  X  can  be  associated 
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with  a  natural  decomposition  tree  whose  vertices  are  sets  in  jF  and  whose  leaves 
are  all  the  smallest  sets  in  the  family  (which  are  elements  of  X,  in  this  case).  We 
can  use  this  to  associate  a  so-called  hierarchically  well-separated  tree  (also  called 
an  HST  Bartal  [1996])  Tn  with  a  hierarchical  decomposition  11;  since  each  edge  in 
7n  connects  some  C  G  IT,  and  C'  €  n,_i  with  C'  C  C,  we  associate  a  length  r],  with 
edge  (C,C').  Given  such  a  tree  Tn,  we  can  (and  indeed  do)  talk  about  its  level-/ 
clusters  with  no  ambiguity;  these  are  the  same  level-/  clusters  in  the  associated  IT,. 
Note  that  the  degree  of  vertices  in  this  tree  Tn  is  bounded  by  deg(n)  -|-  1. 

2.2.2  Padded  Probabilistic  Ball-Partitions 

Recall  that  an  r-ball  partition  IT  of  {X^d)  is  a  partition  of  X  into  a  set  of  clusters 
C  CX,  each  contained  in  a  ball  B(v,  r)  for  some  v  G  X.  B(v,  t)  is  cut  in  the  partition 
n  if  there  is  no  cluster  C  €  IT  such  that  B(x,r)  C  C.  In  general,  B(x,/)  is  cut  by  a 
set  5  C  X  if  both  5  n  B(x,  /)  and  B(x,  /  )  \  5  are  non-empty. 

Let  !P  be  a  collection  of  all  possible  partitions  of  X,  and  hence  IT  €  !P.  Given  a 
partition  IT  G  !P  and  x  G  X,  let  Cu{x)  be  the  cluster  of  IT  containing  x. 

Definition  2.3  (Gupta  et  al.  [2003]).  An  (r,s)-padded  probabilistic  ball-partition 
of  a  metric  (X^d)  is  a  probability  distribution  p  over  iP  satisfying: 

(1)  (bounded  radius)  Each  IT  in  the  support  of  p  is  an  r-ball  partition. 

(2)  (padding)  Vx  e  X,  Pr^  [<i(x,X  \Cn(x))  >  sr]  > 

(This  is  called  a  padded  probabilistic  decomposition  in  Gupta  et  al.  [2003].) 
Each  cluster  C  in  every  partition  IT  in  the  support  of  a  probabilistic  ball-partition 
p  has  radius  at  most  r;  and  for  any  x  G  X,  a  random  r-ball  partition  IT  drawn  from 
the  distribution  p  does  not  cut  B(x,sr)  (and  hence  B(x,sr)  is  contained  in  cluster 
Cy\{x)  E  n)  with  probability  >1/2. 

2.3  Padded  Probabilistic  Hierarchical  Decompositions 

In  this  section,  we  define  a  (p,s) -padded  probabilistic  hierarchical  decomposition 
(PPHD)  of  the  metric  {X.,d),  on  which  the  routing  algorithm  is  based.  A  PPHD  is  a 
probability  distribution  over  HDs  that  has  a  “probabilistic  padding”  property  simi¬ 
lar  to  that  in  Definition  2.3.  For  any  pair  of  nodes  s,t  inX  and  any  ball  containing 
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both  s  and  t  with  a  diameter  of  «  d{s^t),  the  PPHD  ensures  that  this  ball  is  eon- 
tained  in  a  single  eluster  of  radius  only  slightly  («  a  faetor)  larger  than  d{s^t)  at  a 
suitable  level  with  probability  >  Thus  the  shortest  s-t  path  is  eontained  entirely 
in  this  eluster  of  radius  not  mueh  more  than  d{sj).  This  is  the  general  intuition  for 
PPHDs  and  the  starting  point  for  the  routing  algorithm. 

For  our  applieations,  we  refine  PPHDs  so  that  they  eonsist  of  only  m  = 
0(alog  a)  of  HDs.  We  first  give  an  existenee  proof,  using  the  Lovasz  Loeal  Lemma 
(LLL),  to  show  that  sueh  deeompositions  exist  in  Seetion  2.3.1.  We  then  outline  a 
randomized  polynomial-time  algorithm  to  find  fhe  deeomposifions  using  Peek’s 
feehniques  Peek  [1991]  in  Seefion  2.3.2. 

The  exisfenee  proof  for  fhe  PPHDs  has  fhe  following  oufline.  We  firsl  give 
a  randomized  algorifhm  fo  form  a  single  random  hierarehieal  deeomposifion  II, 
whieh  proves  fhe  exisfenee  of  PPHDs,  albeif  wifh  supporf  over  an  exponenfial  num¬ 
ber  of  HDs.  To  reduee  fhe  size  fo  somefhing  fhaf  depends  only  on  a,  we  have  fo  use 
fhe  loealify  properfy  of  fhe  mefrie  spaee  and  fhe  LLL.  One  signifieanf  eomplieafion 
in  fhe  proof  is  fhaf  we  eannof  use  fhe  sfandard  fop-down  deeomposifion  sehemes 
fo  eonsfruef  PPHDs,  sinee  fhey  have  long-range  eorrelafions  fhaf  preelude  fhe  ap- 
plieafion  of  fhe  LLL.  Our  solution  fo  fhis  problem  is  fo  build  fhe  deeomposifion 
frees  in  a  boffom-up  fashion  and  fo  make  sure  fhaf  fhe  eoarser  parfifions  respeef  fhe 
elusfer  boundaries  made  in  fhe  finer  parfifions. 

2.3.1  Existence  of  PPHDs 

Mofivafed  by  fhe  routing  applieafion,  we  are  inferesfed  in  finding  fhe  following 
sfruefure,  whieh  we  eall  a  {p,E)-padded  probabilistic  hierarchical  decomposition. 
This  is  a  probabilify  disfribufion  p  over  p -hierarehieal  deeomposifions  (as  defined 
in  Definition  2.2)  so  fhaf  given  B(x,sr)  wifh  r  «  p',  if  we  ehoose  a  random  p-HD 
n  from  p  and  examine  fhe  parfifion  H,  in  if,  B(x,  r)  is  euf  in  fhis  partition  H,  wifh 
probabilify  af  mosf  ^ . 

Definition  2.4  (PPHD),  A  (p,s) -padded  probabilisfie  hierarehieal  deeomposifion 
(referred  to  as  a  {p,e)-PPHD)  is  a  distribution  p  over  p -hierarchical  decomposi- 
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tions,  such  that  for  any  point  x  ^  X  and  any  value  r  s.t.  <p\ 

PrnG,u[B(x,sr)  is  cut  in  IT,]  < 

where  the  random  p -hierarchical  decomposition  chosen  wll  =  The  degree 

of  the  PPHD  p  is  defined  to  be  deg(/r)  =  maxnG,udeg(n). 

Note  that  the  definition  of  a  PPHD  extends  both  the  idea  of  a  padded  proba- 
bilistie  ball-partition  and  that  of  HDs — we  ask  for  a  distribution  over  entire  HDs, 
instead  of  over  ball-partitions  at  a  eertain  seale  r.  However,  having  pieked  a  random 
p-HD  n  =  (n,)f^  .Q  from  this  distribution,  we  demand  that  balls  of  radius  «  sp'  be 
eut  with  small  probability  only  in  partition  H,-  that  is  “at  the  eorreet  distanee  seale”. 
Our  main  theorem  of  this  seetion  is  the  following: 

Theorem  2.2,  Given  a  metric  {X^d),  there  exists  a  (p,s)-PP//D  pfor  {X,d)  with 
p  =  0(a)  and  £  =  0(l/a).  The  degree  deg{p)  of  the  PPHD  is  at  most  Fur¬ 
thermore,  there  exists  a  distribution  pm  whose  support  is  over  only  m  =  O(aloga) 
HDs. 

Sinee  any  hierarehieal  deeomposition  11  ean  be  assoeiated  with  a  tree  Tu  (as 
mentioned  in  Seetion  2.2.1),  the  above  theorem  ean  be  viewed  as  guaranteeing  a 
set  of  m  trees  sueh  that  the  level-/  elusters  in  half  of  these  trees  do  not  eut  a  given 
ball  of  radius  «  sp'. 

We  prove  Theorem  2.2  in  the  rest  of  this  seetion.  We  first  prove  in  Theorem  2.3 
that  one  ean  obtain  the  result  where  the  PPHD  p  has  support  over  many  HDs. 
We  then  use  the  Lovasz  Loeal  Lemma  to  show  that  a  PPHD  distribution  p^  with 
support  over  only  a  small  number  of  HDs  exists. 

Padded  Probabilistic  Hierarchical  Partitions.  If  we  do  not  eare  about  the  number 
of  HDs  in  the  support  of  a  PPHD,  the  existenee  result  of  Theorem  2.2  has  been 
proved  earlier  Talwar  [2004]  with  better  guarantees;  the  proof  basieally  follows 
from  the  padded  deeompositions  given  in  Gupta  et  al.  [2003].  However,  we  now 
give  another  proof  that  introduees  ideas  that  are  ultimately  useful  in  obtaining  a 
PPHD  distribution  whose  support  is  over  a  small  number  of  HDs. 
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Theorem  2.3.  Given  a  metric  {X,d),  there  exists  a  {p^z)-PPHD  fifor  {X,d)  with 
p  =  0(a)  and  s  =  C?(l/a),  and  with  degree  deg(/7)  =  Furthermore,  one  can 

sample  from  p  in  polynomial  time. 

Proof.  We  define  a  randomized  proeess  that  builds  a  random  hierarehieal  deeom- 
position  tree  in  a  bottom-up  fashion,  instead  of  the  usual  top-down  way.  To  build  a 
HD  n,  we  start  with  (ITo  =  {{x}  :  x  G  X})  and  perform  an  induetive  step.  At  any 
step,  we  are  given  a  partial  strueture  (IT,-, . . .  jITo)  where  for  eaeh  j  <  i,  the  elusters 
in  Hy-i  (whieh  is  an  riy_i-ball  partition)  are  eontained  within  the  elusters  of  fly. 
We  then  build  a  new  partition  n,+i,  with  all  elusters  of  H,  being  eontained  within 
elusters  of  We  have  to  ensure  that  elusters  of  n,_|_i  are  eontained  in  balls  of 
radius  at  most  ri,_|_i  and  that  any  ball  of  radius  sr  for  p'  <  r  <  p'+^  is  eut  in  n,_|_i 
with  probability  at  most  This  way,  we  end  up  with  a  valid  random  HD  11.  The 
elaimed  probability  distribution  p  is  the  one  naturally  generated  by  this  algorithm. 
To  ereate  the  elusters  of  Hj+i,  we  use  a  deeomposition  proeedure  whose  property 
is  summarized  in  the  following  lemma. 


0.  Let  Y  X,  p  F-  ^  for  eonstant  c  to  be  fixed  later,  A  be  a  A/2-net  of  X. 

1.  Piek  an  arbitrary  “root”  vertex  v  G  A  not  pieked  before 

2.  Set  the  initial  value  of  the  “radius”  L  ■<—  A/2 

3.  Flip  a  eoin  with  bias  p 

4.  If  the  eoin  eomes  up  heads,  goto  Step  11 

5.  If  the  eoin  eomes  up  tails,  inerement  L  by  F 

6.  IfL>  A(l-l/4a) 

7.  ehoose  a  value  L  from  [0,  A/(4a)]  u.a.r. 

8.  round  down  L  to  the  nearest  multiple  of  F 

9.  setL^A(l-l/4a)+L 

10.  Else  goto  Step  3 

11.  Form  a  new  eluster  C'  in  H"  eontaining  all  elusters  in  H'  fl  T  with  eenters  lie  in  B(v,L) 

12.  Remove  the  vertiees  in  C'  from  Y 

13.  (Remark:  C'  has  radius  at  most  A-fF) 

14.  If  T  7^  0  goto  Step  1 

15.  End 


Eigure  2.3.1.  Algorithm  Cut-Clusters 
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Lemma  2.1,  Given  a  metric  {X,d)  with  a  T-ball  partition  IT'  of  X  into  clusters 
lying  in  balls  of  radius  at  most  F  >  1,  and  a  value  A  >  SF,  there  is  a  randomized 
algorithm  to  create  a  {A.  +  T)-ball  partition  FI''  ofX,  where  each  cluster  ofH'  is 
contained  in  some  cluster  ofH”,  and  for  any  x  E  X  and  radius  0  <  r  <  A, 

of  r  “h  ri 

Pr[B(x,  r)  is  cut  in  FI"]  < - — -a. 

Proof  Note  that  we  ean  assume  that  F  <  A/ca  and  A  >  a,  sinee  otherwise  the 
lemma  is  trivially  true.  Using  the  algorithmCUT-CHJSTERS  given  in  Figure  2.3.1, 
we  ereate  a  partition  of  Y  (and  henee  of  A);  all  distanees  are  measured  aeeording 
to  the  original  distanee  funetion  dinX. 

Let  us  define  ‘Bx  —  B(x,r).  Note  that  if  Bx  is  eut  in  FI"  due  to  some  value  of  L 
from  V  EN  (for  the  first  time),  then  L  falls  into  the  interval  [(i(  v,  x)  —  r  —  F,  d  (v,  x)  + 
r  +  F].  Indeed,  if  Bx  is  eut  in  FI",  there  are  at  least  two  elusters  C'j,C2  E  FI'  sueh 
that  they  both  eut  Bx,  and  B(v,L)  eontains  one  of  their  eenters  but  not  both.  Sinee 
both  elusters  interseet  Bx,  their  eenters  c\  and  c'2  are  at  distanee  at  most  r  +  F  from 
X.  If  L  <  d{v,x)  —  r  —  F,  the  triangle  inequality  implies  that  B(v,L)  eannot  eontain 
either  eenter.  Similarly,  if  L  >  (i(v,x)  +  r+F,  B(v,L)  eontains  both  of  them.  Henee 
the  value  of  L  must  fall  into  the  interval  indieated  above. 

If  a  eut  in  Step  1 1-12  is  made  due  to  the  appearanee  of  a  heads  in  Step  4,  we  eall 
sueh  a  eut  a  normal  cut,  else  we  eall  it  n  forced  cut.  We  now  bound  the  probability 
that  the  ball  Bx  —  B(x,  r)  is  eut  due  to  either  type. 

Normal  cuts.  Consider  the  first  instant  in  time  when  the  parameter  L  for  some 
root  V  E  N  reaehes  a  value  sueh  that  the  eut  obtained  by  taking  all  FI'  n  F  elusters 
with  eenters  in  B{v,L)  would  eut  Bx.  (If  there  is  no  sueh  time,  then  Bx  is  never  eut 
by  a  normal  eut.)  In  this  ease,  L  must  also  be  in  the  range  d{v,x)  ±  (r  +  F),  and 
inereases  with  time.  Now  either  (i)  we  make  a  normal  eut  before  L  goes  outside 
this  range;  or  (ii)  we  make  a  foreed  eut;  or  (iii)  L  goes  outside  the  range  and  we 
make  no  eut  in  this  range.  In  any  ease,  the  fate  of  Bx  is  deeided;  Bx  is  either  eut 
or  eontained  in  a  new  eluster  with  eenter  v.  We  now  upper-bound  the  probability 
that  event  (i)  happens.  There  are  at  most  2(r  +  F)/F  eoin  flips  made  (with  bias  p) 
when  the  value  of  L  is  in  the  eorreet  range  of  width  at  most  2(r  +  F)  and  one  of 
these  flips  must  eome  up  heads  for  the  eut  to  be  made.  The  trivial  union  bound  now 
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shows  this  probability  to  be  at  most  p  =  a. 

Forced  cuts.  Let  us  look  at  some  root  v  and  bound  the  probability  that  a  foreed 
eut  is  made  with  eutting  radius  L  from  v  in  some  range  ^  =  (i(v,x)  ±  (r+F).  Sinee 
the  eut  is  foreed  and  the  value  of  L  is  greater  than  A(1  —  1  /4a)  >  3A/4,  we  must 
have  flipped  a  sequenee  of  at  least  A/4r  sueeessive  tails;  the  probability  of  this 
event  is  at  most 

(l_^)(A/4r)<^-M/4r^^-fa  ^2.3.1) 

Now,  we  ehoose  L  to  be  a  multiple  of  T  uniformly  in  a  range  of  width  at  most 
A/4a,  and  henee  the  probability  that  L  falls  into  a  range  of  length  2(r  +  r)  is  at 
most  2(r  +  r)/(A/4a).  Multiplying  this  by  (2.3.1),  we  obtain  a  bound  of  x 
a  on  the  probability  that  a  foreed  eut  is  made  around  v  with  L  in  the  range  ^ 
sueh  that  the  eluster  C'  with  eenter  v  in  IT"  may  eut  ‘Bx.  Finally,  for  any  xEX,‘Sx  ean 
only  be  eut  by  elusters  from  roots  v  EN  that  are  at  distanee  at  most  (r+F)  +  A  <  3A 
from  x;  by  Prop.  2.1,  there  are  at  most  |B(x,3A)  nA|  =  (^)“  <  (12)“  of  sueh 
roots.  Now  we  ehoose  c  to  be  large  enough;  the  probability  of  Bx  being  eut  by  a 
foreed  due  to  any  sueh  root  is  at  most  12“  x  x  a  <  by  the 

union  bound.  □ 

We  now  use  the  above  lemma  to  prove  Theorem  2.3.  Using  FI'  =  n„  F  = 
r|,  <  p'(p/(p  —  1)),  and  A  =  r|,_|_i  —  F  =  p'+\  and  using  N  =  Ni+i  (whieh  is  a 
p‘+i/2  =  A/2  net),  we  ereate  a  (F  + A  =  ri;_|_i)-ball  partition  sueh  that  for  all  x  and 
all  r  <  p'+^  and  s  =  0(1 /a),  we  have 

Pr[B(x,sr)  eut]  <  a  <  ^  ^  <  2,  (2.3.2) 

for  p/a  and  c  being  large  enough  eonstants.  The  probability  distribution  p  over 
all  decompositions  11  thus  generated  satisfy  the  requirements  of  a  PPHD  as  given 
in  Definition  2.4.  Finally,  we  bound  the  degree  deg(q)  of  the  PPHD  q;  note  that 
each  level-/  cluster  is  centered  at  some  v  E  Ni,  hence  the  number  of  level-/  clusters 
contained  in  some  level-(/+  1)  cluster  is  (2ri,_|_i/(p'/2))‘^(“)  =  a‘^(“)  by  Prop.  2.1. 

□ 


Few  Hierarchical  Decompositions.  The  above  proof  immediately  gives  us  a 
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PPHD  with  a  support  on  only  M  —  0(logn  +  loglogA)  HDs.  By  sampling  from 
the  distribution  /r  for  M  times,  we  get  the  HDs  , . . .  and  let  the  PPHD  /tm 

be  the  uniform  distribution  on  these  HDs.  By  (2.3.2),  for  eaeh  7  G  [1 .  ..M],  point 
X  E  X  and  radius  r  <  p',  B(.r,sr)  is  not  eut  in  the  partition  Hp^  with  probability 
1/10;  henee  a  Chernoff  bound  implies  that  this  ball  is  eut  in  the  level-/  partitions  of 
more  than  M/2  of  the  HDs  with  probability  less  than  l/(nlog  Now  taking 

the  trivial  union  bound  over  all  possible  values  of  the  eenter  x  EX,  and  all  the  log  A 
values  of  r  whieh  are  powers  of  2  shows  that  the  jum  is  a  (p,s/2)-PPHD  whp. 

Even  Fewer  Hierarchical  Decompositions.  While  the  proof  of  Theorem  2.3 
and  the  diseussion  above  do  not  produee  a  PPHD  with  small  support  (of  size 
f?(aloga)),  we  have  seen  all  the  essential  ideas  required  to  prove  the  existenee  of 
sueh  a  distribution  ju^  and  henee  to  eomplete  the  proof  of  Theorem  2.2.  To  prove 
this  result,  we  use  the  loeality  of  the  eonstruetion,  in  eonjunetion  with  the  Lovasz 
Local  Lemma  (LLL).  This  locality  property  is  the  very  reason  why  we  built  the 
hierarchical  decomposition  bottom-up;  it  ensures  that  if  any  particular  ball  is  not 
cut  at  some  low  level  i  (the  “local  decisions”),  it  is  not  cut  at  levels  higher  than  i 
(i.e.,  the  “non-local  decisions”).  Also,  we  choose  the  decomposition  procedure  of 
Theorem  2.1  in  preference  to  others  (e.g.,  those  in  Gupta  et  al.  [2003]  and  Talwar 
[2004])  since  they  choose  a  single  random  radius  for  all  clusters  in  one  particular 
partition  H  of  X,  which  causes  correlations  across  the  entire  metric  space. 

Proof  of  Theorem  2.2:  To  show  that  there  is  a  distribution  over  only  m  — 
O(aloga)  trees,  we  use  an  idea  similar  to  that  in  the  previous  section,  augmented 
with  some  ideas  from  Gupta  et  al.  [2003].  Instead  of  building  one  hierarchical  de¬ 
composition  n  bottom-up,  we  build  m  hierarchical  decompositions 
simultaneously  (also  from  the  bottom  up). 

As  before,  the  proof  proceeds  inductively;  we  assume  that  we  are  given  level-/ 
partitions  np\ . . .  where  Hp^  is  the  level-/  partition  belonging  to  We 

then  show  that  we  can  build  level-(/  -|- 1 )  partitions  H , . . . ,  n.p|  where  each  Hp^ 
is  a  refinement  of  the  corresponding  and  any  given  ball  B(x,  Sr)  with  p'  <  r  < 

is  cut  in  at  most  m/2  of  these  level-(/  -|- 1)  partitions.  We  start  off  this  process 
with  each  Hp^  =  {{v}  :  x  G  X}  being  the  partition  consisting  of  all  singleton  points 
in  X.  Let  7  =  { 1 , . . . ,  m}.  Given  m  level-/  partitions  (Hpp  jgy,  we  create  m  level-(/ + 
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1)  partitions  using  the  procedure  in  Lemma  2.1  independently  on  each 

of  the  m  decompositions;  parameters  are  set  as  in  the  proof  of  Theorem  2.3,  with 
A  =  r  =  r\i,  and  s  =  1  /0(a).  This  extends  the  m  hierarchical  decompositions 
to  the  (/  +  1)*^  level;  it  remains  to  show  that  the  probability  of  balls  being  cut  is 
small. 

To  describe  the  events  of  interest,  let  us  take  P  =  and  define  Z  to  be  a 
P-net  of  X.  For  each  z  €  Z,  define  fo  be  B(z,2P),  and  fo  be  evenf  fhaf 
is  cuf  in  more  fhan  m/2  of  fhe  partitions  which  we  refer  fo  as  a  “bad” 

evenf  (used  in  Secfion  2.3.2).  We  prove  fhe  claim  using  fhe  Lovasz  Local  Lemma. 

Claim  2,1,  Given  any  Pr[Azez  >  0. 

Lemma  2,2  (Lovasz  Local  Lemma),  Given  a  set  of  events  suppose 

that  each  event  is  mutually  independent  of  all  but  at  most  B  other  events.  Fur¬ 
ther  suppose  that,  for  each  event  Pr[£^+^]  <  p.  Then  if  ep(B+  1)  <  1, 

Proof  of  Claim  2.1:  Firsf,  lef  us  calculate  fhe  probabilify  of  by  changing  fhe 
consfanf  in  s,  we  can  make  fhe  probabilify  fhaf  a  ball  B^  is  cuf  in  one  level- (/  +  1) 
parfifion  fo  be  af  mosf  1  /8.  Lef  us  denofe  by  A{  fhe  evenf  fhaf  B^  is  cuf  in  parfifion 
The  expected  number  of  partitions  in  which  fhe  ball  is  cuf  is  af  mosf  m/8. 
Since  fhe  parfifions  are  consfrucfed  independenfly,  fhe  probabilify  for  fhe  evenf 
B/^^  fhaf  B^  is  cuf  in  m/2  parfifions  (which  is  af  leasf  four  times  fhe  expecfafion) 
is  af  mosf  exp(— 9m/40);  fhis  can  be  esfablished  using  a  sfandard  Chernoff  bound. 
This,  in  furn,  is  af  mosf  (0.8)'”,  which  we  define  fo  be  p. 

Nexf  we  show  fhaf  an  evenf  is  mufually  independenf  of  all  evenfs  B'^^ 
such  fhaf  d{z^7!)  >  4ri,_|_i.  For  each  partition  each  roof  v  G  determines 
ifs  radius  by  conducting  a  random  experimenf  independenf  of  any  ofher  roofs’  ex- 
perimenfs.  These  random  experimenfs,  and  only  fhese,  defermine  whefher  evenfs 
such  as  A{  occur.  In  fum,  whefher  evenf  occurs  is  defermined  only  by  evenfs 
a),..., A!”.  For  a  particular  j,  for  each  z,  all  of  fhe  cufs  fhaf  could  affecf  B^  in 
fhe  algorifhm  Cut-Clusters  are  made  from  roofs  v  G  af  disfance  af  mosf 
2P  +  r  + A  =  2P  +  ri,_|_i  <  2ri,_|_i  from  z.  Whefher  evenf  A/  occurs  is  defermined  by 
fhe  experimenfs  corresponding  fo  fhese  roofs  alone.  If  d{z^z')  >  4ri,_|_i,  fhen  fhere 
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is  no  intersection  between  the  experiments  for  z  and  the  experiments  for  z'  ■  Since 
is  determined  by  A^, . . .  ,A™,  is  mutually  independent  of  the  set  of  all 
such  that  d{z,z!)  >  4ri,_,_i. 

We  apply  the  LLL  now.  Note  that  the  number  of  z'  e  Z  within  distance  4ri,_|_i 
of  for  z  e  Z  is  at  most  |B(z,4ri,_|_i)  nZ|  <  <  0(a)“.  We  define  this 

quantity  to  be  B;  &p{B  +  1)  is  at  most  1  for  m  =  f?(alog  a)  and  Claim  2.1  follows. 
□ 

Having  proved  the  claim,  let  us  now  show  that  with  nonzero  probability,  each 
B(x,  r)  for  X  e  Z  and  p*  <  r  <  p'+^  is  not  cut  in  at  least  m/2  of  the  level-(/  +1)  par¬ 
titions  Let  us  call  this  event  SCi+i.  The  claim  shows  that  with  nonzero 

probability,  each  ball  with  z  G  Z  is  not  cut  in  at  least  m/2  of  the  partitions 
jgy.  Since  each  x  G  Z  is  at  distance  at  most  P  to  some  Zx  G  Z,  the  triangle  in¬ 
equality  implies  that  B(x,sr)  C  B(x,  P)  is  not  cut  if  B(zx,2P)  is  not  cut,  which  holds 
in  at  least  half  of  the  partitions.  Hence  SCj+i  also  holds  with  nonzero  probability. 

Finally,  we  prove  that  we  can  choose  a  random  set  of  HD’s  such  that 

SCj-i-i  occurs  for  each  \  \  <h  simultaneously  with  nonzero  probability.  The 

key  to  the  proof  is  that  we  have  assumed  an  arbitrary  (worst-case)  set  of  partitions 
(nP)7=  j  at  level  i  in  proving  a  nonzero  lower  bound  on  Pr[5'C,_|_i].  Hence,  we 
can  ignore  any  dependence  among  the  events  SCj+i  for  1  <  /  +  1  <  /i,  and  simply 
multiply  their  nonzero  probabilities  together  to  obtain  a  nonzero  lower  bound  on 
the  probability  that  they  all  occur  simultaneously.  □ 

2.3.2  An  Algorithm  for  Finding  the  Decompositions 

The  above  procedure  can  be  made  algorithmic  using  an  approach  based  on  Beck’s 
algorithmic  version  of  the  LLL  (see,  e.g.,  Alon  and  Spencer  [1992];  Beck  [1991]). 
The  decomposition  satisfies  all  properties  of  the  one  that  is  shown  to  exist  using 
LLL  in  Theorem  2.2,  although  with  some  changes  in  constant  parameter  values. 
As  in  the  proof  of  Theorem  2.2,  we  build  m  =  (9(aloga)  HDs  level  by  level  in  a 
bottom-up  fashion. 

On  any  particular  level  /-|- 1,  we  begin  by  choosing  m  partitions  at  random. 
After  making  the  random  choices,  we  examine  the  partitions  and  identify  all  of  the 
bad  events  that  have  occurred.  We  then  group  together  bad  events  that  may  depend 
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on  each  other,  as  well  as  “good”  events  that  may  depend  on  the  bad  events.  Each 
group  forms  a  connected  component  in  the  ELL  dependency  graph.  We  show  that, 
with  high  probability,  all  connected  components  have  size  O(logv),  where  v  =  |Z| 
is  the  size  of  the  sp*+^-net  of  X. 

Once  the  groups  have  been  identified,  we  need  to  eliminate  the  bad  events. 
Hence,  for  each  group,  we  “undo”  all  of  the  random  choices  concerning  that  group, 
while  not  modifying  any  choices  that  do  not  affect  the  group.  New  choices  must  be 
made  for  each  group  so  that  no  bad  event  occurs.  Because  the  group  size  is  small 
(the  number  of  centers  v  €  N,_|_i  concerning  the  group  that  we  choose  random  radius 
for  is  also  f?(logv)),  we  can  find  new  seffings  for  fhese  choices  using  exhausfive 
search  in  polynomial  lime. 

One  inleresling  complication  in  fhis  proof  is  fhal  fhe  sel  of  cluslers  confaining 
a  group  have  differenl  shapes  in  fhe  m  differenf  partitions.  In  each  partition,  we 
cuf  oul  a  “hole”,  and  redo  fhe  choices  wifhin  fhe  hole.  The  boundary  of  fhe  hole  is 
formed  from  fhe  boundaries  of  fhe  cluslers  fhal  may  influence  fhe  bad  evenls  (and 
fhe  good  evenls)  in  fhe  group.  In  forming  fhe  boundary,  addifional  good  evenls  may 
be  added  lo  Ihe  hole.  As  a  consequence,  il  is  possible  lhal  a  good  evenl  inside  a  hole 
in  one  partition  may  appear  inside  a  differenl  hole  in  anolher  partition.  Hence,  when 
we  perform  exhaustive  search,  Ihese  holes  musl  be  considered  logelher.  However, 
our  melhod  of  bounding  Ihe  size  of  each  connected  componenl  already  lakes  into 
accounl  any  merging  of  holes  on  accounl  of  shared  good  evenls,  so  lhal  we  never 
have  to  redo  Ihe  choices  for  a  group  of  size  more  lhan  f?(logv). 

Anolher  issue  is  lhal  Ihe  subsel  of  centers  in  a  hole  lhal  belong  to  Ni+i,  Ihe 
p'+i/2-nel  lhal  covers  Ihe  entire  melric,  may  nol  by  Ihemselves  cover  Ihe  hole. 
(Portions  of  Ihe  hole  may  be  covered  by  centers  oulside  Ihe  hole.)  So  for  each  of 
Ihe  m  partitions,  we  may  have  to  add  additional  nel  poinls  inside  Ihe  hole  to  oblain 
a  complete  cover  for  il.  We  show  lhal  Ihe  size  of  nel  poinls  in  Ihe  hole  increases 
by  only  a  conslanl  factor  and  remains  O(logv),  and  Ihe  degree  of  Ihe  hierarchical 
decomposition  frees  is  al  mosl  as  before. 
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2.4  The  (1  +x)-Stretch  Routing  Schemes 

Given  a  (p,s)-PPHD  with  a  support  on  m  HDs,  we  ean  now  define,  for  every 
0  <  X  <  1,  a  (1  +x)-streteh  routing  seheme  whieh  uses  routing  tables  of  size  at 
most  /n(a/x)^(“)  log  Alog5  bits  at  every  node. 

We  eonsider  routing  sehemes  in  two  models.  In  a  basie  model,  we  assume  that 
there  is  no  underlying  routing  fabrie  and  eaeh  node  ean  only  send  paekets  to  its 
direct  neighbors.  In  a  second  model,  we  can  build  an  overlay  hierarchical  routing 
scheme  upon  an  underlying  routing  fabric  like  IP  that  can  send  packets  to  any 
specific  node  in  the  network.  We  specify  the  routing  algorithm  in  the  basic  model, 
but  also  indicate  how  one  can  circumvent  certain  steps  of  this  algorithm  when  an 
underlying  routing  mechanism  is  given. 

Let  us  recall  some  of  the  notation  defined  earlier.  Let  be  the  m  hier¬ 

archical  decompositions  on  which  has  positive  support,  and  the  level-/  partition 
corresponding  to  be  called  ITp^.  Recall  that  we  can  associate  each  hierarchical 
decomposition  with  a  tree  Tj  (as  outlined  in  Section  2.2.1).  Note  that  each  of 
these  trees  has  a  deg{fUm)  bounded  by  and  a  height  of  at  most  h  =  logp  A  . 
Recall  that  each  internal  vertex  of  the  tree  Tj  at  level  i  corresponds  to  a  cluster  of 
and  leaves  of  Tj^j  E  J,  correspond  to  vertices  in  X,  where  J  =  {1, . . .  ,m}. 
Let  each  internal  vertex  v  of  each  tree  Tj  label  its  children  by  numbers  between  1 
and  deg(/rm);  v  does  not  label  anything  with  the  number  0,  but  uses  it  to  refer  to  its 
parent.  Note  that  this  allows  us  to  represent  any  path  in  a  tree  Tj  by  a  sequence  of 
at  most  2h  —  0(logp  A)  labels. 

2.4.1  The  Addressing  Scheme 

Given  a  tree  Tj  and  a  vertex  x  EX,  we  assign  x  a  local  address  addry(x),  which  con¬ 
sists  ofh—  logp  A  blocks,  one  for  each  level  of  the  tree  Tj.  Each  block  has  a  fixed 

length.  The  block  of  the  addry(x)  corresponds  to  partition  and  contains  the 
label  assigned  to  the  cluster  Cx  containing  x  in  by  C^’s  parent  in  Tj.  Since  any 
such  label  is  just  a  number  between  1  and  deg(qm)>  where  deg(qm)  =  we 

need  O(aloga)  bits  per  block.  In  fact,  one  can  extend  this  addressing  scheme  to 
any  cluster  C  in  Tj.  If  C  is  a  level-/  cluster,  the  -block  of  addry(C)  contains  *’s 
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for  k  <v,  addry(X)  for  the  root  cluster  of  Tj  contains  all  *’s  matching  all  vertices 
inX. 

The  global  address  addr(v)  of  point  v  €  X  is  the  concatenation 
(addti  (v),  •  •  •  ,  addrm(v))  of  its  local  addresses  addry(x)  for  j  e  J.  Since  each  clus¬ 
ter  C  belongs  to  only  one  tree  Tj,  we  define  addty  (C)  to  be  a  sequence  of  #’s  of  the 
correct  length  (where  #  are  dummy  symbols  matching  nothing),  and  hence  define  a 
global  address  of  C  as  well.  (This  is  only  for  simplicify;  in  acfual  implemenfafions, 
cluster  addresses  for  Tj  can  be  given  by  fhe  fuple  (addry(C),7).) 

Since  fhere  are  O(aloga)  bifs  per  block,  h  blocks  per  local  address,  and  m 
local  addresses  per  global  address,  subsfilufion  of  fhe  appropriafe  values  gives  fhe 


address  lengfh  A  fo  be  af  mosf  mxhx  [log(deg(qm))l  —  O(c)Cloga)  x 
O(aloga)  =  O(a^logalogA)  bifs. 


logpA 


2.4.2  The  Routing  Table 

For  each  point  x  EX,  we  maintain  a  routing  table  Routex  that  contains  the  follow¬ 
ing  information  for  each  Tj,  I  <  j  <m: 

(1)  For  each  ancestor  of  x  in  Tj  that  corresponds  to  a  cluster  C  containing  x,  we 
maintain  a  table  entry  for  C. 

(2)  Moreover,  for  each  such  C,  we  maintain  an  entry  for  each  descendant  of  C 
in  Tj  reachable  within  I  hops  in  tree  Tj.  Here  I  =  0(logp  1  /£x),  with  the 
constants  chosen  such  that 

In  the  routing  table  RoutOx  for  x,  each  of  the  above  entries  thus  corresponds  to 
some  level-/'  cluster  C'  in  Tj.  Let  closex(C')  be  the  closest  point  in  C'  to  x.  (We 
assume,  w.l.o.g.,  that  ties  are  broken  in  some  consistent  way,  so  that  any  node  y 
on  a  shortest  path  from  x  to  closex(C')  has  the  value  closey(C')  =  closex(C');  in 
fact,  this  consistency  is  the  only  property  we  use.)  For  this  C' ,  Routex  stores  (a)  the 
global  address  addr(C')  by  which  the  table  is  indexed,  (b)  the  identity  of  a  “next 
hop”  neighbor  y  of  x  that  stays  on  a  shortest  path  from  x  to  the  closest  point 
closex(C')  in  C' ,  and  (c)  an  extra  bit  ValidPathx(C'):  if  the  cluster  i  levels  above 
C  in  Tj  is  the  cluster  C,  then  ValidPathx(C')  is  set  to  be  true  if  B(x,8p''+^)  is 
entirely  contained  within  cluster  C  and  <i(x,closex(C'))  <  sp'  '*'^,  and  is  set  to  be 
false  otherwise.  Of  course,  if  we  reach  the  root  of  Tj  while  trying  to  go  up  i 
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levels,  then  the  bit  is  set  to  be  true.  Note  that  if  there  is  an  underlying  routing 
fabrie  like  IP,  we  ean  store  the  IP-address  of  some  node  in  C'  (say,  the  elosest  one) 
instead  of  (b)  and  (c)  above. 


Lemma  2.3.  The  number  of  entries  in  the  routing  table  Routejc  of  any  x  is  at 
most  logA  X  (a/x)^(“). 


Proof  Let  us  estimate  the  number  of  entries  in  Route^  for  any  x  E  X.  There 


logo  A 


aneestors  of 


are  m  trees.  For  eaeh  tree  Tj,  for  all  j  E  J,  there  are  /i  = 

V  and  the  degree  of  the  tree  is  bounded  by  deg{fUm)  —  Reeall  that  p  and 

1/s  are  both  0{a),  and  henee  £  =  0(log(a/x)).  Plugging  these  values  in,  we  get 
that  the  number  of  entries  for  v  aeross  m  trees  is  at  most  mx  h  x  {deg{iUm)Y  = 
C?(alog  a)  X  C?(logpj  A)  x  aO(af)  _  log  a  x  (a/x)*^!”!.  Eaeh  entry  is  indexed  by  one 
global  address  (of  at  most  A  =  O(a^logalogA)  bits,  whieh  we  do  not  store  in 
Route;c  sinee  we  ean  deduee  it  from  addr(x)  based  on  the  elustering  strueture); 
eaeh  entry  indeed  eontains  the  identity  of  the  next  hop  (whieh  uses  C?(log  6)  bits, 
where  5  is  the  maximum  degree  of  G),  a  path  length  field  (to  be  speeified  in  See- 
tion  3.2),  and  one  additional  ValidPath  bit.  □ 


The  forwarding  algorithm  makes  use  of  two  funetions,  NextHop^^  and 
PrefMatch_t.  For  a  point  x  and  a  level-/'  eluster  C'  in  Tj,  the  funetion 
NextHoP;^(addr(C'))  returns  the  next  hop  on  the  path  from  x  to  close;c(C')  pro¬ 
vided  that  the  next  hop  does  not  leave  the  eluster  C  at  level  /'  +  £  that  eontains  C', 
and  null  otherwise.  (As  we  shall  see,  the  paeket  forwarding  algorithm  is  guaran¬ 
teed  never  to  eneounter  a  null  next  hop.)  Given  points  x  and  t  in  X,  the  funetion 
PrefMatchx(/)  returns  an  addr(C')  in  Route^  sueh  that  in  some  Tj,  t  belongs  to 
the  level-/  eluster  C',  ValidPath;t(G')  is  true,  and  the  value  /  is  the  smallest  aeross 
all  trees.  Note  that  both  of  these  funetions  ean  be  eomputed  effieiently  by  node 
X.  Furthermore,  it  is  possible  to  support  the  funetions  with  data  struetures  of  size 
eomparable  to  that  of  Routejc. 

Note  that  onee  the  points  in  X  have  been  assigned  addresses  (for  whieh  we 
have  deseribed  only  an  off-line  algorithm),  the  routing  tables  ean  be  built  up  in 
a  eompletely  distributed  fashion.  In  partieular,  a  distributed  breadth-tirst-seareh 
algorithm  ean  be  applied  to  determine  whether  a  ball  of  a  eertain  radius  is  eut  in 
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a  particular  decomposition,  and  a  distributed  implementation  of  the  Bellman-Ford 
algorithm  can  be  used  to  establish  the  next-hop  entries  for  destinations  for  which 
the  shortest  paths  lie  within  a  certain  cluster. 

2.4.3  The  Forwarding  Algorithm 

The  idea  behind  the  forwarding  algorithm  is  to  start  a  packet  off  from  its  origin 
s  towards  an  intermediate  cluster  C  containing  its  destination  f,  the  packet  header 
thus  consists  of  two  pieces  of  information  (addr(t),addr(C)),  where  t  is  the  desti¬ 
nation  node  for  the  packet  and  C  is  the  intermediate  cluster  containing  t.  Initially, 
the  cluster  can  be  chosen  (degenerately)  to  be  the  root  cluster  of  (say)  tree  Tj. 

Upon  reaching  a  node  x  in  the  intermediate  cluster  C,  a  new  and  smaller  in¬ 
termediate  cluster  C' ,  also  containing  t,  must  be  chosen,  possibly  from  a  different 
tree;  the  packet  header  must  be  updated  with  addr(C')  that  remains  the  same  until 
reaching  C' .  Suppose  that  the  new  cluster  C'  containing  t  is  at  level  i' .  After  select¬ 
ing  this  cluster,  the  packet  is  sent  off  towards  C'  with  the  new  header,  following  a 
shortest  path  that  stays  within  the  cluster  C  at  level  i'  + 1  that  contains  both  x  and  C' . 
This  process  is  repeated  until  ultimately  the  packet  reaches  the  cluster  containing 
only  the  destination  t.  The  algorithm  is  presented  in  Figure  2.4.2. 


1.  Let  packet  header  be  (addr(t),addr(C)). 

2.  If  C  contains  x,  the  current  node,  then 

3.  find  addr(C')  ■?-  PrefMatch;t(0 

4.  let  y  ^  NextHop^(addr(C')) 

5.  forward  packet  with  new  header  (addr(t), addr(C'))  to  y. 

6.  Else  (now  x  ^  C) 

7.  let  y  NextHop^(addr(C)) 

8.  forward  packet  with  unchanged  header  (addr(t), addr(C))  to  y. 

9.  End 


Eigure  2.4.2.  The  Eorwarding  Algorithm  at  Node  x 


Theorem  2.4.  The  forwarding  algorithm  has  a  stretch  of  at  most  ( 1  +  x),  where 
X  <  1. 
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Proof.  We  first  show  that  the  algorithm  is  indeed  valid;  eaeh  of  the  steps  ean  be 
exeeuted  and  the  paeket  eventually  reaehes  t.  Suppose  that  the  paeket  has  just 
reaehed  a  node  x  in  an  intermediate  cluster  C  containing  t  (with  addr(C)  in  its 
header);  thus  x  needs  to  execute  Step  3  to  find  a  new  cluster  C'  containing  t.  Clearly, 
PrefMatch;i;(0  can  return  the  root  cluster  Croat  of  any  Tj,  since  it  contains  t.  We 
show,  however,  that  the  cluster  C'  returned  by  PrefMatch_t(f)  has  a  small  diameter 
and  nodes  along  a  valid  shortest  path  from  x  to  C'  will  forward  the  packet  correctly 
until  it  reaches  C' . 

Lemma  2.4.  If  the  packet  is  at  node  x  with  distance  to  the  target  t  being 
d{x,t)  <  sp',  Step  3  must  return  some  addlfC')  such  that  cluster  C'  3  t  is  at  level 
{i  —  i)  or  lower  in  some  Tf  with  ValidPath_t(C^)  being  true.  Furthermore,  all 
vertex  v  on  all  shortest  paths  from  x  to  closex{C')  =  closev{C')  has  a  non-null 
NextHop^  ( addr{C' ) ) . 

Proof.  The  (p,s)-PPHD  ensures  that  there  exists  at  least  one  tree  Tj  such  that 
B(x,sp')  is  not  cut  in  the  level-/  partition  ITp^;  let  Ccont  G  be  the  level-/  cluster 
in  Tj  that  contains  B(x,sp').  Let  C,  €  be  the  level- (/  —  C)  cluster  in  Tj  con¬ 
taining  t.  The  ValidPath;t(Q)  bit  must  be  true  since  B(x,sp')  C  Ccont  in  and 
<i(x,closex(Q))  <d{x,t)  <sp';thus  PrefMatch;,;  can  (and  may  indeed)  just  return 
addr(C,)  given  no  “better”  choices.  However,  PrefMatch_,:  always  finds  a  cluster  C' 
in  some  Tf,  at  the  lowest  level  across  all  trees,  such  that  t  E  C',  and  ValidPath_r(C') 
is  true  in  Routejc.  Let  the  level  of  C'  be  /';  the  value  i'  is  at  most  (/  —  i).  Now 
Let  C  E  be  the  cluster  I  levels  above  C'  E  n|/  ^  in  Ty  that  contains  both  x 
and  C'.  (Such  C  must  exist  at  level  i'  3-1  for  addr(C')  to  be  in  Route;c.)  We  know 
that  B(x,sp''+^)  C  C  and  (i(x,close;c(C'))  <  sp''+^  since  ValidPath;c(C')  is  true 
in  Route^.  Thus  all  shortest  paths  from  x  to  closex(C')  are  entirely  contained  in 
C.  Hence,  the  NextHopy(addr(C'))  pointer  at  any  node  v  on  one  of  these  paths 
must  be  non-null  since  all  shortest  paths  from  v  to  closev(C')  =  close;c(C')  are  all 
contained  in  C,  the  cluster  £  levels  above  C'  in  Tj.  □ 

It  remains  to  bound  the  path  stretch.  Consider  the  case  when  a  packet  is  sent 
from  s  to  t.  Let  C'  be  a  cluster  at  level  /  —  £  returned  by  Step  3  of  the  forwarding 
algorithm.  Note  that  if  the  level  /  <  £,  then  C'  =  {?}  and  we  send  the  packet  directly 
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to  t  with  X  =  0.  Using  these  short  distances  as  the  base  case,  we  now  do  induction 
on  the  distance  from  s  to  t. 

If  C'  is  a  non-trivial  cluster  containing  t,  then  we  go  on  a  shortest  path  from 
5  to  some  vertex  v  =  closei(C')  e  C'.  Since  1  E  C',  d{s,v)  <  d{s,t).  Because  the 
diameter  of  C'  is  at  most  d{v,l)  <  l^i-i  <  <  d{s,t)-  (The  last  in¬ 

equality  holds  because  if  >  d{sd),  then  PrelMatch^  would  have  returned  a 
cluster  at  a  level  lower  than  that  of  C'  by  Lemma  2.4.)  Hence,  we  can  apply  the 
induction  hypothesis  to  find  a  path  from  v  to  t  of  length  at  most  (1  -\-x)d{vd)  < 
(1  -|-x)2ri,_£.  The  path  from  5  to  t  as  derived  from  Route^  is  of  length  at  most 
d{s^v)  -f  (1  -\-x)d{yd)  <  d{sd)  +  (1  +  x)2ri,_£.  The  stretch  of  the  path  from  5  is  t 
is  then  1  +  (1  +  x)2r[i-^/d{sd)-  This  quantity  is  at  most  1  -fx  since  X  <  1  and  we 
have  chosen  constants  so  that  ri,_£  <  xsp'“^/4.  □ 
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3  Routing  Table  Construction  Using 
Bellman-Ford 


3.1  Introduction 

The  hierarchical  routing  scheme  we  are  going  to  describe  in  this  section  is  a  com¬ 
pletion  of  what  is  lacking  in  Section  2.4;  hence  we  focus  primarily  on  the  process 
of  building  up  routing  tables  using  a  distributed  implementation  of  Bellman-Ford 
algorithm  for  the  base  model  that  we  introduce  in  Section  3.2.  For  overlay  routing, 
we  store  the  IP  address  of  an  intermediate  node  to  reach  each  destination  in  the 
routing  tables  and  the  process  of  routing  table  updates  are  similar  to  that  of  prefix 
routing,  e.g.,  in  Hildrum  et  al.  [2002].  Although  the  Forwarding  algorithm  remains 
the  same  as  that  in  Section  2.4.3,  we  will  elaborate  in  more  details  on  its  behavior 
in  Section  3.3  when  it  is  coupled  with  the  new  routing  algorithm. 

Our  routing  scheme  is  similar  in  spirit  to  that  of  Closest  Entry  Routing  (CER) 
scheme  described  in  KK(  Kleinrock  and  Kamoun  [1977]).  They  define  a  hierarchi¬ 
cal  roufing  scheme  by  firsl  specifying  an  “optimal”  underlying  hierarchical  clus¬ 
tering  sfrucfure  fhaf  fhey  impose  on  fhe  nefwork  nodes,  where  fhe  optimization 
objecfive  is  fo  minimize  fhe  routing  fable  lengfh;  each  level-k  clusfer  is  defined 
recursively  as  a  sef  of  level- (k  —  1)  clusfers,  wifh  fhe  level-0  clusters  being  individ¬ 
ual  nodes.  This  leads  nafurally  fo  a  free  represenfafion  as  shown  in  Eigure  3.1.1, 
where  infernal  free  nodes  represenf  clusters;  Table  3.1  shows  fhaf  fhe  desfinafion 
addresses  in  fhe  roufing  fable  of  node  A  corresponds  fo  clusfers  af  differenf  levels 
of  fhe  decomposifion  free,  hence  reflecling  fhe  sfrucfure  of  fhe  hierarchical  clus¬ 
tering  of  nefwork  nodes.  In  KK,  fwo  nodes  share  common  roufing  fable  enfries  for 
all  fhe  clusters  fhaf  confain  bofh  of  fhem.  KK  assumes  fhaf  all  clusfers  af  fhe  same 
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level  have  the  same  number  of  sub-elusters  within  them,  and  eaeh  eluster  is  a  eon- 
neeted  eomponent.  The  KK  hierarehieal  routing  proeedure  leads  a  message  down  a 
tree  path,  fixing  more  prefix  digifs  af  eaeh  sfep,  mueh  as  prefix  roufing,  fraversing 
smaller  and  smaller  elusfers  fhaf  eonfain  fhe  desfinafion  node  until  if  reaehes  fhe 
desfinafion  ifself. 


0000  0002 
0001 


Figure  3.1.1.  A  4-level  Hierarehieal  Clustering  Sfruefure  of  Nefwork  Nodes 


Level  3 

2*** 

Level  2 

00** 

01** 

02** 

03** 

Level  1 

000* 

001* 

002* 

003* 

Level  0 

0000 

0001 

0002 

0003 

Table  3.1.  Routing  Table  Enfries  in  Node  A  in  Figure  3.1.1 

The  reduefion  of  routing  fable  size  generally  leads  fo  an  inerease  in  nefwork 
pafh  lengfh.  In  order  fo  derive  bounds  on  fhe  inerease  in  fhe  average  pafh  lengfh, 
fhey  furlher  assume  fhaf  a  shorfesf-palh  befween  fwo  nodes  in  a  elusfer  lies  wifhin 
fhe  eluster.  They  also  preseribe  an  upper-bound  of  dk  on  fhe  (sfrong)  diamefer  of  a 
level  elusfer,  wifh  dk  deereasing  as  k  deereases.  They  show  fhaf  roufing  sehemes 
based  on  fhe  hierarehieal  elusfering  model  eause  essenfially  no  inerease  in  fhe  av¬ 
erage  network  path  length  for  a  family  of  large  disfribufed  nefworks.  Speeifieally, 
fhe  nefworks  fhey  eonsider  are  all  eonneefed  graphs  upon  whieh  if  is  possible  fo 
til  a  hierarehieal  elusfering  whose  oufeome  safisfies  fhe  assumptions  above.  In  ad- 
difion,  (a)  fhe  resulting  elusfers  af  any  level  satisfy  fhe  following:  fhe  diamefer 
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of  any  cluster  S  chosen  is  bounded  above  by  0(|5'|'')  for  some  constant  V  G  [0, 1], 
and  (b)  the  average  distance  between  nodes  in  the  network  is  0(77''),  where  N  is 
the  size  of  such  a  network. 

In  contrast,  our  hierarchical  routing  schemes  give  bounds  on  the  path  stretch 
on  a  per  node-pair  level  on  certain  networks  that  are  connected  graphs  G,  where 
the  natural  metric  {X^d)  induced  by  shortest  path  distances  between  any  pair  of 
nodes  in  G  is  a  doubling  metric.  In  addition,  the  main  improvement  our  work  over 
that  of  KK  is:  while  the  KK  routing  scheme  is  based  on  assumptions  regarding  the 
existence  of  a  “good”  partition  of  the  network,  the  method  itself  does  not  provide 
an  algorithm  for  computing  such  a  partition;  we  are  able  to  prove  the  existence 
of  a  (p,s)-PPHD  with  a  support  on  m  Hierarchical  Decompositions  and  actually 
find  them  by  following  the  Clustering  algorithm  and  its  constructive  algorithm  de¬ 
scribed  in  Section  2.3.  Note  that  while  we  guarantee  a  degree  bound  for  the  decom¬ 
position  trees  across  all  levels,  we  do  not  require  they  are  exactly  the  same. 

It  would  be  ideal  if  once  we  construct  such  a  set  of  network  partitions,  we  can 
run  the  hierarchical  routing  algorithm  specified  in  KK  af  each  individual  decom¬ 
position  free.  However,  if  is  nol  possible  fo  direcfly  apply  KK’s  roufing  scheme  or 
fheir  proof  techniques  for  fhree  reasons.  Firsf,  while  KK  assumes  fhaf  each  cluster 
subnetwork  is  fully  connected,  this  is  not  satisfied  in  our  decomposifion.  Second, 
fhe  shorfesf  paths  between  two  nodes  in  a  cluster  are  not  guaranteed  to  stay  within 
the  cluster.  Finally,  although  the  maximal  distance  in  G  between  vertices  of  Cjt,  for 
all  0  <  k  <  h,  is  bounded  within  the  diameter  of  Ck,  2r\i,,  which  is  geometrically 
decreasing  as  k  decreases,  it  is  a  weak  diameter  bound  and  not  necessarily  satisfied 
by  the  distance  induced  by  the  subgraph  corresponding  to  each  cluster  Ck- 

We  thus  adopt  as  many  definitions  and  notation  as  possible  from  KK  in  this 
section  while  inventing  some  new  techniques  for  addressing  the  above  issues  in  the 
design  and  specification  of  a  modified  hierarchical  routing  scheme  given  a  (p,E)- 
PPHD  Pm  with  a  support  on  m  HDs  and  in  the  analysis  of  the  characteristics  of  paths 
as  induced  by  the  routing  tables  thus  created.  The  important  property  of  a  (p,E)- 
PPHD  that  we  will  use  in  defining  our  roufing  scheme  is  thaf,  for  p*“^  <  r  <  p\ 
fhere  is  af  leasf  one  free  Tj  such  that  B(5,sr)  is  contained  in  a  level  i  cluster  C,-  in 
the  level-/  partition  Hp^ ;  since  a  ball  is  a  connected  component,  all  shortest  paths 
from  s  to  vertices  within  B(5,  sr)  must  be  contained  within  C,  in  the  level-/  partition 
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npl 

3.2  Routing  Table  Construction 

In  this  section,  we  focus  on  the  process  of  building  up  routing  tables  once  the  nodes 
in  the  network  have  been  assigned  addresses  that  reflect  their  positions  in  each  of 
the  m  decomposition  trees.  During  this  process,  routing  information  is  aggregated 
and  exchanged  between  special  nodes  in  different  clusters  at  each  level.  We  refer 
to  such  special  nodes  as  exchange  nodes  (for  routing)  or  entry  points  (for  packet 
forwarding)  of  their  corresponding  clusters.  The  algorithm  for  selecting  exchange 
nodes  for  each  cluster  is  an  independent  issue  that  we  do  not  address  in  this  paper. 
Similar  to  the  CER  hierarchical  routing  scheme  described  in  KK,  no  routing  infor¬ 
mation  describing  the  internal  behavior  of  a  cluster  is  propagated  outside  a  cluster; 
hence  a  cluster  is  regarded  from  outside  as  a  single  node  whose  distance  to  itself  is 
zero. 

We  use  a  modified  version  of  the  distributed  Bellman-Ford  algorithm  as  in 
Fig  3.2.2  to  perform  routing  updates:  especially,  to  establish  the  next-hop  entries 
and  update  estimated  path  lengths  for  destination  clusters  in  the  routing  tables  for 
the  basic  model.  For  routing  updates,  we  are  going  to  focus  on  entries  for  one 
specific  decomposition  tree  Tj  that  corresponds  to  =  (np^)^^Q. 

Fet  s  and  t  be  two  neighboring  nodes  (that  they  are  connected  by  a  chan¬ 
nel  (s^t))  which  belong  to  the  same  level  cluster  Q  G  and  not  to  any 
lower  level  cluster  in  Tj,  where  k  e  {1,2, . . .  ,/i}.  Fet  Ck-i{s)^Ck-\{t)  G  re¬ 
spectively  denote  the  k  —  level  clusters  to  which  s  and  t  each  belong  in  tree 
Tj.  Fet  Ck{s,t)  denote  the  level-k  cluster  that  contains  both  s  and  f,  note  that 
Ck-\{s),Ck-i{t)  C  Ck{s,t)  in  Tj  since  Tj  represents  a  laminar  decomposition.  We 
use  lca^(5,f)  to  denote  the  lowest  common  ancestor  of  s  and  f  in  a  particular  tree 
Tj-,  hence  lca^(5,f)  =  Ck{s,t)  E  For  a  pair  of  nodes  s,  t,  \C3.^ {s,t)  can  be 
determined  by  inspecting  the  common  prefixes  of  local  addresses,  addry(5)  and 
addry(f). 

Recall  that  in  node  s,  for  any  cluster  Ci{s)  in  Tj  that  contains  s  at  level  i,  for 
all  /  =  0, . . . ,  h,  routing  table  entries  are  kept  for  all  clusters  that  are  descendants  of 
Ci{s)  E  within  £  levels  down  a  decomposition  tree  for  Tj,\/ j.  Thus  each  entry 
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in  the  routing  table  Route^  for  Tj  eorresponds  to  some  level-(/')  eluster  C'  E 
in  Tj,  where  i'  —  0,1, ...  ,h  —  1;  that  entry  is  also  denoted  as  C'  and  indexed  by 
the  global  address  acldr(C')  of  its  assoeiated  eluster  C',  and  eontains  the  following 
fields  in  Route^:  (a)  a  next  hop  NextHop^(addr(C'))  to  reaeh  C'  from  5,  (b)  a  path 
length  field  HF(5,C')  fhaf  is  fhe  eurrenf  pafh  lengfh  af  node  s  for  reaehing  elusfer 
C'  fhrough  NextHop^(addr(C')),  and  (c)  a  ValidPath^(C')  bif.  Inifially,  fhe  pafh 
lengfh  fields  for  all  fhe  enfries  in  Route^  for  free  Tj  are  sef  fo  oo  exeepf  for  fhe  self 
enfries  as  shown  in  fhe  Initialization  Proeedure  in  Fig  3.2.2. 

We  use  Ci{s,C')  =  Ci{s)  E  fo  denofe  fhe  level-/  common  ancestor  of  s  and 
C  E  such  fhaf  /  >  /'  -|-  1  and  Ci(s)  D  C'.  Note  fhaf  Ch{s,C')  —  Ch{s)  confains 

C  E  for  all  /'  <h—l,  since  Ch{s)  confains  fhe  entire  nefwork.  Similarly,  we 

use  \C3l^{s,C')  to  denofe  fhe  lowesf  common  ancestor  of  s  and  C'  E  in  free 
Tj,  where  C'  C  \c3l^{s,C')  C  Ci{s,C')  for  all  i  such  fhaf  C'  C  Ci{s).  For  node  5  and 
elusfer  C',  fhe  \Cdi^{s,C')  can  be  defermined  by  inspeefing  fhe  common  prefixes  of 
local  addresses  addrj(5)  and  addry(C'). 

As  a  consequence  of  fhe  roufing  fable  specification,  roufing  fable  enfries  af  node 
s  and  t  af  all  levels  below  k  — I  in  Tj  refer  fo  differenf  elusfer  desfinafions;  whereas 
all  fhe  ofher  enfries  from  level  k  —  I  up  to  h  refer  fo  fhe  same  elusfer  desfinafions 
in  Tj.  The  objeefive  of  fhe  updating  procedure  is  fo  compare  fhe  esfimafed  lengfhs 
of  fhe  pafhs  from  5  or  t  fo  any  common  destination  and  fo  updafe  fhe  roufing  fables 
fo  reflecf  fhe  shorter  pafhs.  Whenever  s  receives  a  route  updafe  from  t,  for  each 
common  desfinafion  elusfer  C' ,  ifs  corresponding  enfry  is  pofenfially  updated  wifh 
a  new  nexf  hop  NextHop^(addr(C')),  fhe  pafh  lengfh  HF(5,C')  fhrough  fhe  new 
NextHop^(addr(C'))  as  in  Step  2-4,  and  fhe  ValidPath^(C')  bif  as  in  Step  5-9  of 
fhe  Roufe  Updafe  Procedure  in  Fig  3.2.2. 

We  have  a  slighfly  differenf  way  of  setting  fhe  ValidPath^(C')  bif  from  fhaf 
specified  in  Section  2.4.2  fo  maximize  fhe  chance  of  setting  if  true.  However,  as 
before,  once  fhe  ValidPathi(C')  bif  is  sef  fo  be  true,  a  shorfesf  pafh  from  s  fo  C' 
is  indeed  guaranteed  by  following  fhe  nexf  hop  in  Route^  for  an  enfry  C'  and  fhaf 
in  Routey  of  each  subsequenf  nodes  v  along  fhe  pafh  from  s  fo  an  enfry  poinf  of  C' . 

Lef  a  common  desfinafion  enfry  for  Tj  in  Route^  and  Routej  correspond  fo  a 
level-(/')  elusfer  C'  E  where  i'  >k  —  i.  We  denofe  fhe  level  of  lca-'(5,C')  in  Tj 
as  The  following  inequalifies,  i'  +  l  <lo  <  i'  -f  i,  musf  be  satisfied  for  C'  fo  be 
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an  entry  in  Route^.  The  ValidPath^(C')  bit  is  set  to  be  true  so  long  as  for  “any” 
of  the  eommon  aneestor  Ci{s,C')  of  5  and  C'  at  level  i,  for  all  lo  <  i  <  i'  +  both 
HF(5,C')  <  sp'  and  B(5,sp*)  C  Ci{s,C')  are  true.  It  is  set  to  be  false  otherwise. 
Note  that  when  i'  >h  —  i,  both  HF(5,C')  <  A  and  6(5,  A)  C  Ch{s,C')  are  always 
true  sinee  Ch{s^C')  is  the  entire  network;  henee  we  set  ValidPathi(C')  bit  true 
for  all  C'  at  level  h  —  I  and  above  in  Step  5  of  the  Initialization  Proeedure. 

The  reason  we  set  ValidPathi(C')  bit  this  way  is  the  following.  Reeall  that 
by  eonstrueting  the  m  deeomposition  trees,  eaeh  node  s  “knows”  if  B(5,sp') 
is  eontained  Ci{s)  E  in  tree  Tj;  naturally,  if  B(5,£p‘)  C  Ci(s)  E  then 
B(5,sp*)  C  C/(s)  E  is  true  for  all  I  >  i.  However,  if  B(5,sp*)  ^  Ci{s),  we  do 
not  assume  that  we  know  information  sueh  as  “whether  a  ball  B(5,  r)  of  a  radius 
£p*  >  r  >  is  eontained  in  Ci{s)  or  not”,  sinee  that  is  not  the  type  of  informa¬ 
tion  that  our  eonstruetive  algorithm  provides  by  default;  note  that  if  r  <  we 

will  just  eheek  if  B(5,  sp('  ^))  C  to  deeide  if  B(5,r)  C  Ci{s).  Our  routing 

algorithm  thus  makes  minimal  assumptions  about  the  information  that  is  available 
at  eaeh  node  about  balls  around  it  being  eontained  at  a  eertain  level  or  not. 

Another  speeifieation  in  terms  of  routing  that  is  different  from  that  of  See- 
tion  2.4.2  is  the  following.  Assume  we  route  a  paeket  from  s  toward  C'.  Instead 
of  assuming  the  paeket  should  always  enter  a  eluster  C'  through  the  elosest  point 
X  =  closei(C')  in  C'  to  s,  we  only  require  that  the  paeket  enters  C'  through  a  elosest 
entry  point  eo  E  C' .  Correspondingly,  for  node  s  and  a  level-(/')  eluster  C'  E 
in  Tj,  the  funetion  NextHop^(addr(C'))  returns  the  next  hop  on  the  path  from  s 
to  eo  provided  that  the  next  hop  does  not  leave  the  eluster  C  at  level  {i'  +  £)  that 
eontains  C',  and  null  otherwise.  Reeall  an  entry  point  eo  E  C  advertises  routes  for 
C'  it  belongs  to.  Note  also  eo  does  not  need  to  be  the  elosest  one  to  s  in  C'  in  or¬ 
der  to  aehieve  (1  +x)-streteh  routing.  (This  is  also  true  for  overlay  routing.)  As  a 
basie  routing  seheme,  we  keep  a  next  hop  NextHop_j(addr(C'))  in  Route^  toward 
a  elosest  entry  point  eo  E  C  for  the  sake  of  routing  table  eonsisteney  that  we  will 
elaborate  shortly. 

For  overlay  routing,  we  keep  the  IP  address  of  an  arbitrary  entry  point  eo  to 
C'  (instead  of  a  next  hop  NextHop^(addr(C'))  toward  eo),  sinee  IP  routing  will 
deliver  a  paeket  from  s  to  eo  direetly  given  the  IP  address  of  eo  without  having  to 
rely  on  hop-by-hop  forwarding  as  in  the  basie  model  that  we  foeus  in  this  seetion. 
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Initialization  Procedure:  initialize  Route^  for  tree  Tj  at  node  s 

1.  For  /  =  0, 1, . . .  ,/i 

2.  HF(5,C;(5))  =  0,  and  ValidPathi(C,(5))  =  true 

3.  For  all  other  entries  C'  ^  s,  let  i'  =  level  of  C'  in  tree  Tj 

4.  HF(5,C')  =  oo 

5.  If  i'>h- i,  then  ValidPath,(C')  =  true 

6.  End 

Route  Update  Procedure:  upon  receiving  a  route  update  from  t  such  that  lca^(5,t)  =  Ck 

1.  For  each  common  entry  C'  G  which  represents  a  level-(/')  cluster  in  Tj,  where  i'  >k  —  I 

2.  If  HF(5,C')  >  +  HF(t,C'),  then 

3.  HF(5,C') -t— +  HF(t,C') 

4.  nexthop  field  of  C'  ■<—  t 

5.  If  i'  <h  —  I,  then 

6.  Let  Zo  =  level  of  \C3l^{s,C')  in  Tj  and  m  satisfies  <  HF(5,C')  <  sp"* 

7.  for  all  levels  i :  max{Zo, m}  <i<i'  +  £ 

8.  If  B(5,sp'”)  C  B(5,sp')  C  Ci{s)  in  Tj,  then 

9.  ValidPath,(C')  =  true 

10.  Goto  1 

11.  End 


Eigure  3.2.2.  Distributed  Bellman-Eord  Algorithm  for  Tj  at  Node  s 

Definition  3.1.  We  call  a  path  an  internal  path  in  cluster  C  if  all  the  nodes  in  that 
path  belong  to  C. 

Similar  to  KK,  we  define  the  equilibrium  condition  as  the  situation  when  no 
changes  occur  in  the  topology  of  network  and  the  contents  of  HF(5,C')  in  the 
routing  table  reach  “minimal”  constant  values  after  a  certain  number  of  updates. 

Claim  3.1.  The  distributed  Bellman-Ford  algorithm  guarantees  that  in  equilibrium 
condition,  HF{s  ,C')  will  be  the  length  of  the  shortest  path  from  s  to  a  closest  entry 
point  eo  ofC'  when  ValidPathi(C')  is  true,  i.e.,  HF{s,C')  =  d{s,eo)  in  Routes- 

Proof  Let  the  level  of  C'  E  in  tree  Tj  be  l'  <  h  —  £  and  let  the  level  of 
lca^(5,C')  be  Zq.  We  only  set  ValidPathi(C')  true  in  the  routing  algorithm 
when  for  “any”  of  the  level-Z  cluster  Ci(s)  E  np\  where  Iq  <  i  <  i'  -\- £,  both 
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HF(5,C')  <  sp'  and  B(5,sp')  C  Ci{s)  E  hold.  Denote  the  lowest  sueh  level  r, 
where  r  E  [/q,  i'  +  ■  All  shortest  paths  from  s  to  some  entry  point  x'  E  C  of  distanee 
d{s,x')  <  HF(5,C')  <  sp''  are  thus  internal  to  Cr{s)  E  in  Tj,  sinee  sueh  paths 
are  eontained  in  B(5,sp'^),  whieh  is  a  eonneeted  eomponent  entirely  eontained  in 
Cr{s)  E  Note  that  some  x'  E  C'  must  have  advertised  itself  as  an  entry  point  to 
C'  for  sueh  paths  to  be  established  within  {Cr(5)  —  C'}  and  for  C'  E  to  appear 
in  Routes.  Thus  C'  C  Cr{s)  sinee  x'  E  {C'  fl  Cr(5)}  7^  0  and  r  >  Iq\  we  thus  denote 
Cr{s)  as  Cr{s^C')  from  this  point  on. 

In  addition,  every  node  v  E  Cr{s,C'),  ineluding  those  along  the  shortest  paths 
from  s  to  x'  inside  B(5,  sp''),  eontains  a  routing  table  entry  to  C',  sinee  it  is  a  deseen- 
dant  of  Cr{s^C')  within  I  levels  down  the  deeomposition  tree  Tj.  Propagation  and 
subsequent  updating  of  routing  information  among  nodes  of  Cr{s^C')  is  equivalent 
to  finding  minimum  path  internal  to  Cr{s,  C')  from  any  node  v  E  {Cr(5, C')  —  C'}  to 
an  entry  point  of  C'  that  is  elosest  to  node  v;  for  s,  the  elosest  entry  point  to  C'  is 
eo- 

Improvements  are  made  sequentially  at  eaeh  update  over  the  distanee  HF(m,C') 
from  u  to  C'  among  nodes  within  B(5,  sp''),  until  it  reaehes  a  minimal  eonstant  value 
if  no  ehanges  oeeur  in  the  topology  of  the  network;  henee  all  u  E  B(i',sp'')  “knows” 
how  to  route  to  C'  with  a  path  of  bounded  length.  Given  multiple  entry  points  to 
C',  the  distributed  Bellman-Ford  algorithm  guarantees  that  we  find  a  shortest  path 
not  only  to  some  entry  point  x'  of  C',  but  also  to  the  elosest,  eo  of  C',  from  s 
in  equilibrium  eondition,  i.e.,  HF(.  ,C')  =  d(s,eo)-  The  entire  path  stays  within 
B(5,sp'')  C  Cr(s,C'),  where  r  is  speeitied  as  above. 

Note  that  when  i'  >h  —  l,  both  HF(5,C')  <  A  and  B(5,  A)  C  Ch{s,C')  are  always 
true  sinee  Ch{s^C')  is  the  entire  network;  henee  we  set  ValidPathi(C')  true  for  all 
C'  at  level  h  —  i  and  above.  The  same  argument  as  above  applies  to  this  ease.  □ 

The  reason  we  require  a  elosest  entry  point  to  C'  is  primarily  for  route  eon- 
vergenee  purpose  when  our  protoeol  serves  as  an  underlying  routing  seheme.  For 
overlay  routing,  we  allow  an  entry  point  to  be  any  exehange  node  or  simply  a 
random  node  within  the  eluster,  whieh  is  eommonly  assumed  in  peer-to-peer  net¬ 
works.  Note  that  an  exehange  node  of  a  given  eluster  is  a  node  of  that  eluster  whieh 
is  eonneeted  to  one  or  more  nodes  external  to  that  eluster  as  defined  in  KK.  We  will 
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use  exchange  node  and  entry  point  interchangeably  unless  we  specify  otherwise. 
The  (1  +x) -stretch  property  we  are  going  to  prove  for  hierarchical  routing  paths 
does  not  require  the  entry  point  for  a  cluster  C'  to  be  the  closest  to  s  either  -  a  point 
that  we  will  not  elaborate  on  from  now  on. 

Fact  3.1.  If  a  shortest  path  from  s  to  cq,  an  entry  point  to  a  level-{i')  cluster  C'  € 
n./\  is  internal  to  Cfs)  G  ITp^  in  tree  Tj,  where  i  >  i',  then  cluster  C'  3  eo  must  be 
a  sub-cluster  that  is  entirely  contained  in  Cfs)  in  Tj,  i.e.,  C'  C  Cfs). 

Proof  First  observe  eo  C  HC'},  since  shortest  path  from  s  to  eo  is  internal  to 
Ci{s)  €  in  Tj.  Since  Tj  represents  a  laminar  decomposition,  where  a  lower  level 
cluster  is  always  entirely  contained  in  a  higher  level  cluster,  eo  G  Q  is  sufficient  to 
guarantee  that  {C'  3  f}  C  Q.  □ 
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We  forward  packets  according  to  the  Forwarding  algorithm  in  Figure  2.4.2.  Let 
s  and  t  be  two  arbitrary  nodes.  For  destination  t,  let  C'{t)  be  the  cluster  whose 
addr(C'(f))  is  returned  by  the  function  PrefMatchi(t)  at  Step  3  of  the  Forwarding 
algorithm.  We  assume,  w.l.o.g.,  C'{t)  G  FIpP  i.e.,  C'{t)  is  in  the  level-(/')  partition 


nPP  where  i'  <h  —  I,  in  tree  T,-.  Recall  that  h  = 


logo  A 


.  Let  Iq  <hhe  the  level 


of  lca-'(5,C'(t))  Gnp^  inTj. 

We  say  C'{t)  3  f  is  the  cluster  that  has  the  longest  valid  prefix  mafching  wifh  t 
in  Route^,  since  fhe  level  of  C'{t)  is  fhe  lowesf  across  all  frees  among  clusfers  C'  in 
Route^  such  fhaf  C'  3  t  and  ValidPathi(C')  is  true.  Before  we  proceed,  we  firsf 
give  more  definitions,  some  of  which  are  adapfed  from  KK. 

h%\  Lengfh  of  fhe  esfimafed  minimum  pafh  from  node  s  fo  node  t  as  derived 
from  fhe  routing  information  af  node  s.  (The  superscripf  c  sfands  for  clusfered 
routing.) 

Exchange  node  e^\  a  node  of  a  clusfer  C  fhaf  is  connecfed  fo  one  or  more  nodes 
exfernal  fo  C. 

Aft):  Subsef  of  all  exchange  nodes  (enfry  poinfs)  fhaf  conned  a  level-/  clusfer 
Ci{t)  G  np^  in  free  Tj,  for  all  y  =  1, . . .  ,m,  wifh  any  ofher  level-/  clusfer  wifhin  fhe 
same  ancestor  C„(t)  G  ITp^  in  fhe  same  free  Tj,  for  all  n  <i  +  i.  From  fhe  above 
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definitions,  all  entry  points  of  C'{t)  E  that  connect  C'(l)  to  any  other  level- 
(/')  cluster  that  stays  within  C,'^£(t)  E  in  tree  Tj  hence  belong  to  Aii{t). 

Let  cq  E  A,/(f)  nC'(f)  be  the  closest  entry  point  for  s  to  reach  C'{t)  E  LI^/^  in  Tj. 

Ck{sA)'-  For  k  <h—\,  Ck{sA)  E  is  the  level-^  cluster  in  Tj,  where 
lo  <  k  <  i'  +  i,  that  is  the  lowest-level  common  cluster  of  s  and  t  such  that 
B(5,sp*)  C  Ck{sA)  and  B(i',sp*^)  contains  a  shortest  path  from  5  to  C'{t)  E  in 
Tj,  where  i'  <  h  —  £;  such  Ck{s,t)  E  always  exists  since  we  know  B(5,sp'')  C 
Cr{s,t)  and  HF(5,C'(i))  <  sp'"  must  both  hold  for  some  h  <r  <  i'  + 1,  given 
that  ValidPath^(C'(f))  is  true  in  Route^,  due  to  the  specification  of  the  dis¬ 
tributed  Bellman-Ford  algorithm.  Let  k  be  the  lowest  such  level  r.  Note  that 
C'{t)  C  Ck{s,t)  since  Tj  represents  a  laminar  decomposition  and  k  is  at  least  Iq. 
For  k  —  h,  Ck{s,t)  —  Ch{s,t)  E  is  the  root  cluster  X  of  Tj  that  corresponds  to 
the  entire  network  G.  In  this  case,  Ck{s,t)  =  Ch{s,t)  always  contains  all  shortest 
paths  from  s  to  C'{t)  E  Tl\P  in  Tj,  where  i'  =  h  —  I,  given  that  G  is  a  connected 
graph. 

Length  of  the  shortest  path  from  node  s  to  an  exchange  node  E 
Aii{t)  nC'(t)  as  contained  in  Ck{s,t)  defined  above.  The  superscript  i  stands  for 
an  internal  path  within  Ck{s,t).  At  equilibrium,  =  HF(5,C'(f))  =  d{s,eo) 

since  the  shortest  path  from  5  to  eo  is  internal  to  Ck{s,t),  and  by  Claim  3.1, 
HF(5,C'(t))  =  d{s,eo)  in  Route^  given  that  ValidPathi(C'(t))  is  true  in  Route^ 
and  eo  is  the  closest  entry  point  to  C'{t)  for  node  s.  Recall  that  HF(5,C'(t))  is  the 
current  path  length  filed  in  Route^  for  node  s  to  reach  C'{t)  E  FI^/^  via  its  current 
NextHop^(addr(C'(t))).  Note  when  the  shortest  path  from  5  to  is  not  internal  to 
Ck{s,t),  we  denote  it  with  =  00. 

In  order  to  reach  t,  function  PrefMatch^(f)  is  called  by  the  Forwarding  al¬ 
gorithm  at  node  s,  which  looks  across  Route^  for  all  trees  and  picks  a  tree 
Tj  that  contains  C'{t)  with  a  closest  entry  point  eo  E  A/(t)  nC'(l).  Node  s 
then  stores  (addr(f),addr(C'(f)))  in  the  packet  header  and  sends  the  packet  to 
NextHop^(addr(C'(?)));  the  packet  header  remains  the  same  while  intermediate 
nodes  v  forward  the  packet  along  a  shortest  path  from  s  to  eo,  that  is  contained  in 
the  common  cluster  Ck{s,t)  of  5  and  t  in  Tj,  until  it  reaches  eo- 

The  key  observation  we  have  regarding  a  path  hlf  from  5  to  f  is  the  following. 
The  path  may  not  be  contained  within  the  lowest  common  ancestor  Ica^  (5,  t )  E 
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of  s  and  f  in  a  particular  tree  Tj.  However,  the  segment  from  s  to  C'{t),  is  contained 
within  Ck{s,  t)  in  Tj,  where  Iq  <k  <i'  -\- 1,  when  following  a  shortest  path  from  s  to 
cq,  which  is  the  closest  entry  point  to  Recall  Ck{s,t)  is  a  common  cluster  of 
5  and  C'{t)  at  a  level  higher  than  that  of  \cdi^{s,t).  Conceptually,  we  route  packets 
from  5  to  t  within  Ck{s,t)  E  to  avoid  being  stuck  in  lca^(5,f)  E  which 
may  not  contain  any  path  (e.g.,  when  lca-'(5,t)  in  Tj  is  disconnected)  or  contains 
only  very  long  paths  from  s  to  t.  The  shortest  path  from  s  to  eo  is  thus  an  internal 
path  relative  to  Ck{s,t),  which  we  denote  with 

Finally,  We  define  a  constant  (|)  =  -^  that  we  will  use  throughout  this  section.  It 
is  easy  to  verify  that  2ri,_£  <  <  (|)Ep'.  Recall  that  I  =  0(logp  1  /sx)  and 

p  =  0(^),  where  we  choose  suitable  constants  so  that  p^  <  is  satisfied.  The 
rest  of  this  section  is  dedicated  to  the  proof  of  the  main  theorem  of  this  section, 
before  which  we  first  prove  two  lemmas  regarding  the  level  of  C'{t)  and  Ck{s,t) 
given  d{s,t).  Note  that  we  always  have  k  <  h  and  i'  <h  —  1.  We  will  ignore  the 
case  when  k  —  h  until  the  end  of  this  section. 

Lemma  3.1.  Let  d{s,t)  <  (1  —  (|))sp',  where  I  <i  <  h.  The  cluster  C'{t)  E  Tlj/^  in 
Tj  that  has  the  longest  valid  prefix  matching  with  t  with  ValidPathi(C'(i))  =  true, 
is  at  a  level  i'  <  max  (0,  i  —  i);  the  common  cluster  Ck{s,t)  E  as  defined  above 
that  contains  the  shortest  path  from  s  to  C'{t)  is  at  level  k  <  i. 

Proof.  We  first  prove  the  lemma  when  i  <  I  with  the  following  claim. 

Case  i  <  L 

Claim  3.2.  Let  sp'“^  <  d{sf)  <  for  \  <i<t  Then  C'{t)  =  Co(t)  is  t  itself;  the 
lowest  common  cluster  Ck{s,t)  such  that^{s,£,p^)  C  Ck{s,t)  and^{s,£,p^)  contains 
the  shortest  path  from  s  to  Co(t),  i.e.,  t  itself  is  at  level  k  =  i. 

Proof  Node  5  has  a  routing  table  entry  for  all  t  such  that  d{sf)  <  sp^,  since 
B(5,(i(5,t))  C  B(5,sp^)  is  fully  contained  in  some  level-.^  cluster  Cj{s)  E  in 
some  tree  Tj,  and  C'{t)  is  Co(f)  E  TIq'^^ 

The  properties  of  the  (p,£)-PPHD  ensure  that  there  is  at  least  one  tree  Tj  such 
that  B(5,sp')  C  Ci{s)  E  in  Tj.  Since  d{sf)  <  sp',  we  know  that  t  E  B(5,sp') 
and  Co(t)  C  Cfs)  in  Tj.  The  lowest  common  cluster  Ck{s,t)  such  that  B(5,£p^)  C 
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Ck{s,t)  and  B(5,sp*^)  contains  the  shortest  path  from  5  to  C'{t)  —  Co(t),  i.e.,  t  itself, 
is  Ci(s)  E  in  tree  Tj  and  k  =  i.  □ 

We  now  prove  the  general  case  when  i  >  £. 

Case  h  —  l>i>i.  Let  x'  E  be  an  arbitrary  entry  point  to  some  level-(/  —  i) 

cluster  C  3t  in  some  tree;  hence  d{x' ^t)  <  2r[i-c  <  (|)sp'  since  E  C.  Applying 
the  triangle  inequality,  we  have  d{s,x')  <  d{sd)  -\-d{t,x!)  <  sp';  thus  all  shortest 
paths  from  s  to  x',  for  all  x!  E  Ai-£(t),  are  contained  in  6(5, sp'). 

The  properties  of  the  (p,s)-PPHD  ensure  that  there  is  at  least  one  tree  Tq 
such  that  B(5,sp')  is  not  cut  in  the  level-/  partition  TIpp  let  Ci{s)  E  be  the 
level-/  cluster  in  Tq  such  that  B(5,sp')  C  Ci(s).  Since  d(s,t)  <  (1  —  we  have 
t  E  B(5,sp')  C  Ci(s)  E  np^.  Let  Ci-e{t)  €  ITp^^  be  the  level-(/  —  £)  cluster  in  Tq 
containing  t\  we  know  that  C  Ci{s),  since  t  E  {C, ■_£(?)  nC,(5)}  and  Tq  rep¬ 

resents  a  laminar  decomposition.  Hence  we  have  Ci{s)  —  Ci{t)  —  Ci{sd)  in  the 
level-/  partition  Hp^  in  tree  Tq. 

The  ValidPath^(C, ■_£(/))  bit  must  be  set  true  in  Route^  by  the  dis¬ 
tributed  Bellman-Ford  algorithm  in  node  s,  since  (a)  B(5,sp')  C  Ci{sd)  E  TIpP 
and  (b)  HF(.  ^Ci-t{t))  <  sp'  in  Route^  for  entry  Ci-e{t)  E  in  tree  Tq  at 
equilibrium,  given  that  all  shortest  paths  from  s  to  an  entry  point  x',  for  all 
y  E  Ai-£(t)  nC, •_£(/),  are  internal  to  B(5,sp').  Thus  PrefMatchi(/)  can  (and  may 
indeed)  just  return  addr(C,_f  (/))  given  no  “better”  choices,  in  which  case,  i'  =  i  —  £ 
and  k  <  i. 

However,  PrefMatchi(f)  always  finds  a  cluster  C'{t)  E  at  the  lowest  level 
across  all  trees,  such  that  t  E  C'{t)  and  ValidPath^(C'(/))  is  true  in  Route^;  hence 
C'(t)  is  at  level  i'  <i  —  £. 

We  know  that  B(5,sp'')  ^Cr{sd)  €np^  and  HF(5,C'(/))  <  sp'"  must  both  hold, 
for  some  Iq  <r  <i'  + 1,  in  order  for  ValidPath^(C'(/))  bit  to  be  true,  due  to  the 
specification  of  the  distributed  Bellman-Ford  algorithm.  Let  k  be  the  lowest  such 
r;  we  have  k<i'  +  l<i  for  /  >  1. 

Case  /  =  h.  We  have  k  <h  and  i'  <h  —  t  trivially,  since  both  holds  for  all  possible 
distances  of  d{sd)  up  to  A,  which  is  the  diameter  of  the  network  G.  □ 

Claim  3.3.  When  C'{t)  E  Hp^  is  at  level  1,  d{sd)  >  £p^- 
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Proof.  Prove  by  contradiction.  Assume  that  d{sf)  <  sp^.  By  Claim  3.2,  we  have 
C'{t)  =  Co(i),  contradicting  the  assumption  that  C'{t)  is  at  level  1.  □ 

Lemma  3.2.  Let  C'{t)  C  be  the  cluster  returned  by  function  PrefMatchgf)  at 
Step  3  in  the  Forwarding  algorithm  and  its  level  be  i',  where  h  —  £>i'>  1.  Then 
d{s,t)  >  (1  -(|))£p''+^“^ 

Proof  Prove  by  contradiction.  Assume  that  d{sf)  <  (1  —  (|))sp' By 
lemma  3.1,  the  cluster  C'{t)  3  t  that  has  the  longest  valid  prefix  matching  with 
t  with  the  ValidPathi(C'(f))  bit  set  true  in  Route^  is  at  level  at  most  i'  —  1,  thus 
contradicting  the  assumption  that  C'{t)  is  at  level  i'.  □ 

We  next  prove  the  following  lemma  regarding  the  level  of  C'{t)  given  the  level 
of  Ck{sf). 

Lemma  3.3.  Let  a  leveTk  cluster  Ck{sf)  E  in  tree  Tj,  where  h  —  1  >k>  I, 
be  the  lowest-level  common  cluster  of  s  and  t  such  that  a  shortest  path  from  s  to 
C'{t)  is  contained  in'R{s.,zp^)  FCk{sf)  ThenC'f)  is  at  either 

level  k  —  lor  level  k  —  l-\-f. 

Proof  Be  definition  of  Ck{s.,  t),  we  know  that  Iq  <k  <i'  -fl  and  Iq  >  i'  +  1,  where 
Iq  is  the  level  of  lca^(5,C'(f)).  Thus  k  —  l<i'<k—\.  The  lowest  level  that  C'{t) 
can  be  is  at  k  —  I,  and  we  argue  that  C'{t)  can  not  be  at  a  level  higher  than  k  —  l-\-\. 

Let  eo  be  a  closest  entry  point  to  C'{t)  for  s,  such  that  eo  E  Aift)  nC'(t) 
and  the  shortest  path  from  s  to  eo  is  internal  to  B(5,sp*^)  C  Ck{sf)  E  hence 
d{s,eo)  <  sp^.  Since  C'{t)  is  at  least  one  level  below  Ck{s,t)  in  Tj  and  eof  E  C'{t), 
we  have  d{eo,t)  <  2T\k-i.  Note  that  C'{t)  C  Ck{s,t)  by  Fact  3.1.  Applying  the  tri¬ 
angle  inequality,  we  have  d{sf)  <  d{s,eo)  -\-d{eod)  <  sp^  +  2ri;t_i. 

Now  we  examine  the  distance  of  d{s^y')  for  all  y'  E  Given  that 

d{t,y')  <  2r\k-i+i,  we  apply  the  triangle  inequality  and  obtain: 

d{s,y')  <  d{s,t) d{t,y') 

<  d{s,eo)-\-d{eo,t)+d{t,y') 

<  sp^  +  2ri;t-i+2ri;t-t-Hi 
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where  1>1  and  p  =  0(^). 

Thus  all  shortest  paths  from  5  to  y' ,  for  all  y'  €  Ak-i+\{t),  are  contained  in 

The  properties  of  the  (p,s)-PPHD  ensure  that  there  is  at  least  one  tree 

Tji  such  that  6(5,  sp(^+^) )  C  (s)  e  .  Let  Q-£+i  (0  G  be  the  level- 

(^  —  ^+1)  cluster  that  contains  t  in  Ty.  Given  that  t  €  B(5,sp*^''“')  C  ^+1(5),  we 

know  that  Ck-i+\{t)  C  Ck+i{s)  €  since  t  e  {Ck-i+\{t)^Ck+\{s)}  /  0  and  Ty 

i  i') 

represents  a  laminar  decomposition.  Thus  Ck-e+i{t)  €  must  appear  in  s’ 

routing  table  with  ValidPath^(CA:_£+i(t))  set  true,  since  Ck-e+i{t)  C  ^+1(5)  is 
within  £  levels  below  Cjt+i(5)  in  Ty  and  all  shortest  paths  from  s  to  Ck-t+\{t)  are 
contained  in  B(5,sp*^+^)  C  ^+1(5)  in  Ty. 

Thus  the  level  i'  of  C'{t)  must  satisfy  k  — i<i'<k  — i+  \  for  C'{t)  G  Il\P  to 
be  returned  by  PrefMatch^(t).  □ 

Fact  3.2.  When  k  =  h  and  Ck{sd)  G  is  Ch{sd)  G  which  is  the  entire 
network  G,  we  know  that  C'{t)  E  is  at  level  i'  =  h  —  l  =  k  —  t 

The  next  lemma  shows  the  path  characteristics  from  5  to  t  up  till  entry  point  e^ 
oiC'{t)En\P. 

Lemma  3.4.  All  messages  to  be  forwarded  or  sent  from  node  s  to  node  t  will 
follow  the  same  shortest  path  up  to  the  closest  entry  point  eo  of  C'{t)  E  to  s. 
The  shortest  path  from  s  to  eo  is  internal  to  Ck{sf)  E  in  Tj;  it  has  a  length  of 
h‘^g^  that  satisfies: 


Keo  =  min,^^A,,{t)na{t){Ke,}^  (3-3.1) 

where  i'  is  the  level  of  C'{t)  E  and  k  <  i'  +  1,  and  Ck{sf)  E  Ajft),  and 
h\^^  are  as  defined  above,  and  h‘^g^  —  0°  when  the  shortest  path  from  s  to  e^  is  not 
contained  in  Ck{s,t).  At  equilibrium,  h^^^  =  HF{s,C'{t))  =d{s,eo).  Finally,  all  ver¬ 
tices  V  on  the  shortest  path  from  s  to  eo  have  a  non-null  NextHop^{addt{C' {t))) 
and  share  the  same  closest  entry  point  eo  to  cluster  C'{t). 

Proof  By  the  definition  of  Ck{sf),  for  k  <h—l,  we  know  that  Ck{sf)  E  is 
the  level-k  cluster,  where  lo<k<  i'  +  £,  in  tree  Tj,  that  is  the  lowest-level  common 
cluster  of  s  and  t  such  that  B(5,sp  ^  Ck{s,t)  E  and  B  (5,sp*^)  contains  a 
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shortest  path  from  5  to  C'{t)  €  Yl\P  in  Ty,  specifically,  6(5, sp^)  contains  a  shortest 
paths  from  5  to  cq,  and  d{s,eQ)  <  By  Fact  3.1,  we  have  C'{t)  C  Ck{s,t).  When 
k  =  h,  Ck{sy)  =  Ch{sy)  €  is  the  root  cluster  X  and  naturally  contains  all 
shortest  paths  from  s  to  C'(t)  C  Ch{sy),  given  that  G  is  a  connected  graph. 

All  the  nodes  in  Ck{s,t)  E  contain  one  entry  for  C'(t)  E  in  their  rout¬ 
ing  tables,  since  k  <  i'  +  i  and  C'{t)  C  Ck{s,t)  is  a  cluster  within  I  levels  below 
Ck{sy)  E  in  tree  Tj.  Propagation  and  subsequent  updating  of  routing  infor¬ 
mation  among  nodes  of  Ck{sy)  E  in  Tj  is  equivalent  to  finding  the  minimum 
path  internal  to  Ck{s,t)  from  any  node  u  E  {Ck{s,t)  —C'{t)}  to  an  entry  point  of 
C'{t)  that  is  closest  to  node  w,  for  s,  the  closest  entry  point  to  C'{t)  is  eo  such  that 

Hence,  at  equilibrium,  eo  is  on  the  minimal  path  from 
s  to  C'{t)  and  —  HF(5,C'(f))  represents  the  length  of  such  minimal  path. 

All  shortest  paths  of  length  d{s^eo)  from  5  to  eo  are  internal  to  Ck{s,t)',  when 
k  <  h  —  1,  it  is  within  B(5,sp^)  C  Ck{sd)  E  Hence,  at  equilibrium,  within 
B(5,sp^)  E  Ck(s,t)  E  for  k  <  h  —  1,  or  within  Ck{sd)  =Xfork  =  h,a.  shortest 
path  of  length  =  d{s,eo)  is  formed  between  s  and  eo  among  nodes  within  a  con¬ 
nected  component,  that  share  a  common  entry  for  C'{t)  C  Ck{s,t)  in  their  routing 
tables.  Thus  we  have  HF(5,  C'{t))  =K^^^=d{s,eo). 

For  any  node  v  on  one  of  these  shortest  paths  from  s  to  eo,  s  and  v  must  share 
the  same  closest  entry  point  eo  to  C'{t)  at  equilibrium,  due  to  the  execution  of 
the  distributed  Bellman-Ford  algorithm;  furthermore,  intermediate  nodes  v  will  be 
able  to  route  the  packet  toward  C'{t)  E  in  Ck{sd)  consistently  since  they  each 
contain  an  entry  for  C'{t)  E  with  a  non-null  NextHoPy(addr(C'(t)))  field, 
given  fhaf  fhese  pafhs  slay  wilhin  Ck{sd)  E  F[['^\  where  k  <  i'  +  i.  The  Forwarding 
algorilhm  will  forward  messages  from  node  s  destined  lo  node  t  along  fhe  shorlesl 
pafh  fhus  formed  lo  firsl  reach  C'{t)  in  free  Tj.  □ 

The  process  of  finding  fhe  nexl  enlry  poinf  repeals  by  fhe  time  fhe  packel 
reaches  eo,  an  enlry  poinf  lo  C'{t)  E  in  free  Tj,  unlil  fhe  packel  reaches  ils 
destination  t.  For  example,  eo  selecls  a  new  free  Ti  lhal  conlains  Ihe  nexl  clus¬ 
ter  C"{t)  E  wilh  a  longer  prefix  malching  wilh  t  lhan  C'{t),  and  updates  Ihe 
packel  header  wilh  C''{t)  accordingly.  Note  lhal  C''{t)  and  C'{t)  may  belong  lo 
Iwo  differenl  frees;  hence  while  intermediate  nodes  belween  one  enlry  poinl  and 
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another  never  switch  trees,  upon  reaching  an  entry  point,  it  is  free  to  switch.  The 
next  lemma  states  the  upper  bound  on  the  level  i"  of  C''{t)  E  and  the  level  of 
the  common  cluster  Ck{x,t)  E  that  contains  a  shortest  path  from  x  to  C"{t)  in 
B(x,sp^)  e  4(^,0  enp.  If  X  is  t  itself,  we  are  done  with  forwarding. 

Lemma  3.5.  Let  \  <i'  <h  —  i  be  the  level  ofC'{t)  E  Once  the  packet  from  s 
reaches  an  entry  point  x  in  Aift)  r\C'{t),  including  eg,  x  will  find  a  new  level-{i") 
cluster  C"{t)  E  at  level  i"  <  max(0, —  2)  in  some  tree  Ti,  and  the  common 
cluster  Ck{x^  t)  E  as  defined  above  is  at  a  level  k  <i'  —  2  +  i. 

Proof.  We  have  d{xf)  <  2ri//  <  since  x  €  A,/(t)  OC'{t)  is  an  entry  point  to 

some  level-(/')  cluster  C'{t)  E  containing  t.  We  have  d{x,t)  <  (1  —  (|))sp' 
so  long  as  p^  <  which  can  be  satisfied  when  suitable  constants  are  chosen 
for  i  —  0(logp  1  /sx)  and  p  =  0(^)-  Lemma  3.1  tells  us  that  k'  <  i'  —  2  +  i  and 
C"{t)  E  n|i^)  is  at  level  <  max  (0,  i'  —  2).  □ 

We  are  now  ready  for  the  main  theorem  that  summarizes  the  path  properties. 

Theorem  3.1.  Follow  the  Forwarding  algorithm  in  Section  2.4.3,  for  all  k  <h, 
the  path  from  s  to  t  as  derived  from  the  routing  information  at  node  s  satisfies  the 
recursive  equation  below,  h^^  —  where  the  shortest  path  h^^^  from  s  to  eo 

is  contained  in  Ck{s,t)  and  its  properties  are  as  specified  in  Lemma  3.4.  Secondly, 
the  lookup  path  has  a  stretch  of  at  most  (1  +x).  Finally,  the  algorithm  switches 
trees  for  at  most  m&x{Q^k  —  I  +  1)  times.  When  d{s,t)  <  {I  —  (|))£p”,  where  n  <h, 
we  have  k  <  n;  otherwise,  k  <h. 

Proof.  The  proof  of  the  theorem  is  by  induction  on  k,  which  is  the  level  of  the 
lowest  common  cluster  Ck{sf)  of  5  and  C'{t)  such  that  a  shortest  path  from  5  to 
C'{t)  is  contained  in  (a)  B(5,sp^)  C  Ck{s,t)  for  k  <h—l,  or  in(b)  Ckisfi)  —Ch{sf) 
for  k  =  h.  Recall  i'  is  the  level  of  C'{t)  E  Ll\l\  and  eo  E  Afit)  FC'{t)  is  the  closest 
entry  point  to  C'{t)  for  node  s  within  Ck{sd)  E 

Base  Case:  k  <i  —  1. 

We  first  prove  the  following  claim. 

Claim  3.4.  IfCk{s,t)  is  at  level  k  <i—l,  then  C'{t)  —  Co{t)  and  d{sd)  <  sp^“^ 
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Proof.  By  the  definition  of  Ck{sf),  we  know  B(5,sp^)  C  Ck{sf)  E  in  Tj  and 
d{s^eo)  <  sp*^,  where  eg  E  C'{t)  is  the  elosest  entry  point  to  C'{t)  for  node  s.  Thus 
(i(5,eo)  <  ep^“^  for  k  <  .£  —  1.  Sinee  C'{t)  E  n!'/^  is  a  deseendant  of  Ck{sf)  € 
in  Tj,  it  must  be  at  a  level  lower  than  k\  henee  d{eQf)  <  2ri£_2  <  <  4p^^^, 

sinee  eo,t  E  C'{t),  and  C'{t)  is  at  level  i'  <i  —  2. 

Applying  the  triangle  inequality,  we  have  d{sf)  <  d{s,eo) +  d{eod)  <  sp^“^  + 

4p^-2 

<  sp^.  Thus  by  Claim  3.2,  we  have  C'{t)  =  Co{t),  whieh  is  t  itself;  further¬ 
more,  eo  —  t  and  d{sf)  —  d{s,eo)  <  ep^~^.  □ 

The  above  elaim  shows  that  Cifsd)  E  in  tree  Tj  eontains  a  shortest  path 
from  s  to  C'{t)  =  Co(t)  E  Tlo'^^  and  t  is  the  elosest  entry  point  to  Co(t),  whieh  is 
t  itself.  Thus  =  0,  sinee  a  node’s  distanee  to  itself  is  zero.  It  remains  to 

show  that  /ijj  =  ;  reeall  refers  to  the  shortest  path  from  5  to  t  as  ineluded 

in  Ck{sf).  This  is  true  sinee  the  routing  table  of  every  node  v  in  Ck{sf)  E  for 
k<i—\  eontains  an  entry  for  Co(t )  =  t,  and  a  shortest  path  from  5  to  t  is  eontained 
in  B(5,sp^)  C  Ck{sf)  in  tree  Tj-,  henee  at  equilibrium,  the  elustered  path  between 
s  and  t  as  derived  from  Route^  is  the  shortest  path  from  s  to  t,  and  it  is  internal 
to  Ck{s,t),  i.e.,  =  HF(5,Co(t))  =  =  d{s,t),  where  k  <  i—l.  The  streteh  is 

exaetly  1  sinee  —  1.  The  forwarding  algorithm  does  not  switeh  tree  at 

all. 

Case  k  =  LBy  Lemma  3.3,  C'{t)  is  at  level  0  or  1.  When  C'{t)  is  at  level  k  —  i  =  0, 
the  proof  is  the  same  as  that  in  the  base  ease. 

When  C'{t)  is  at  level  k  —  .£  +  1  =  1,  we  have  d{s,t)  >  sp^  by  Claim  3.3.  All 
messages  to  be  forwarded  or  sent  from  node  s  to  node  t  will  first  follow  the  same 
shortest  path  of  length  =  d{s,eo),  that  is  internal  to  C{{s,t),  up  to  the  elosest 
entry  point  eo  of  C'{t),  as  speeified  in  Lemma  3.4. 

Upon  reaehing  eo.  Lemma  3.5  ean  be  applied  to  show  that  the  elustered 
path  from  eo  to  t,  is  entirely  eontained  in  a  level-(k')  eluster  Ck'{eo,t)  in  some  tree 
Tji,  where  k'  <  i'  —  2  +  i  =  i  —  1-,  thus  as  proved  in  the  base  ease,  = 

d{eod).  The  elustered  path  from  5  to  t  as  derived  from  Route^  indeed  satisfies 
Kt  =  Keo+Kot^  where  =  d{s,eo)  and  =  d{eQ,t). 

Henee  we  obfain  fhe  bound  on  fhe  enfire  pafh:  /ij,  =  d{s,eo)  +d{eod)  < 
d{sf)+2d{eod),  where  d{s,eo)  <  d{sf)  +  d{eod)  by  friangle  inequalify.  And  fhe 
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path  stretch  is:  =  1  +  <  1  +  <  1  +  X,  where  ^  >  1  +  (logp  4/sx) . 

The  algorithm  switches  trees  at  most  once. 

Case  k>  l+\.  First  we  assume  that  the  theorem  is  true  up  to  —  1,  let  us  show 
that  it  is  true  for  k. 

Let  Ck{s^t)  e  be  the  k‘’^  level  cluster  that  contains  a  shortest  path  from  s  to 
C'{t)  E  Il\P  in  Tj.  According  to  Lemma  3.4,  all  messages  to  be  forwarded  or  sent 
from  node  s  to  node  t  will  first  follow  the  same  shortest  path  of  length  that 
is  internal  to  Ck{s,t),  up  to  the  closest  entry  point  cq  of  C'{t).  By  Lemma  3.3,  we 
know  that  C'{t)  is  at  level  k  —  iork  —  i+l  when  k  <h—\.  When  k  =  h,  C'{t)  is 
at  level  h  —  i  =  k  —  i. 

Upon  reaching  eo,  Step  3  of  the  forwarding  algorithm  is  applied  to  find  C"{t), 
that  has  the  longest  valid  matching  with  t  in  go’s  routing  table.  Since  C'{l)  is  at 
level  k  —  i  or  k  —  i+\.  Lemma  3.5  shows  the  lowest  common  cluster  Ck'{eo,t)  of 
eo  and  t,  such  that  B(eo,£p*^')  C  Ck'{eQ,t)  and  B(eo,£p^)  contains  a  shortest  path 
from  eo  to  C"{t),  is  at  level  k'  <  i'  —  2  + 1  <k  —  Thus  is  known  from  the 
induction  hypothesis  and 

Now  we  proceed  to  prove  the  bound  on  the  stretch  for  level  k.  Let  C'{t)  be  at 
level  ^  —  l,  where  P  is  k  or  k+  1;  hence  d{eo^t)  <  2rip_f  given  that  eo,t  E  C'{t).  By 
Lemma  3.2,  d{s^t)  >  (1  —  (|))spP“\  where  i'  —  ^  —  i>\  for  aWky 

By  Lemma  3.4,  is  the  shortest  path  from  s  to  cq  that  is  internal  to  Ck{s,t) 
and  —  d{s,eQ)',  applying  the  triangle  inequality,  we  obtain: 

Keo  =  d{s,eo)  <  d{s,t)  +  d{eQ,t)  <  r/(5,t)  +  2rip_£.  (3.3.2) 


By  the  induction  hypothesis,  we  know 

K,t  <  ( 1  +  ^)d{ead)  <2(1+  x)rip-£.  (3.3.3) 


Finally,  we  get  the  bound  on  the  total  path  length  from  s  to  t: 


Kt  —  Keo  'f~Kot  ^d{s,t)  +2(2  +  x)rip- 


(3.3.4) 


3.3  Path  Characteristics 


53 


Now  using  the  fact  that  d{s^t)  >  (1  — (|))spP  \  and  the  fact  that  x  <  1,  we  obtain 
the  path  stretch  from  5  to  t: 


=i  +  iP±^<i  + 


6rip_£ 


d{s,t) 


d{s,t) 


(l-^)spP 


-1 


<  1+x, 


(3.3.5) 


where  £  >  (logp  8/sx)  +  2. 

Finally,  the  algorithm  switches  trees  for  at  most  k  —  £  times  to  finally  route 
within  a  level  £  cluster,  after  which  it  switches  tree  at  most  once,  thus  adding  up  to 
a  total  number  ofk  —  £+\  times. 

Now  we  look  at  the  bound  on  k  itself.  When  d{sd)  <  (1  —  (|>)£p”?  for  all  n  <h, 
we  have  k  <nby  Lemma  3.1.  We  now  verify  that  all  statements  in  the  theorem 
still  apply,  for  the  clustered  path  h%,  when  d{s,t)  >  (1  —  (|))sp^*^  and  Ck{s,t)  is 
Ch{s,t).  First  of  all,  when  Ck{s,t)  —  Ch{s,t),  following  the  Forwarding  algorithm  in 
Section  2.4.3,  we  know  that  C'{t)  is  at  level  h  —  £  and  hence  d{s,t)  >  (1  — (|))Ep(^^^) 
by  Lemma  3.2.  The  shortest  path  from  5  to  eo  C  Keo’  internal  to  Ch{s,  t),  the 
entire  network  G.  Upon  reaching  eo,  is  known  by  applying  the  theorem  directly 
since  d{eo,t)  <  <  (1  — (|))£p^’“'.  Thus  we  have  <  {l  +  x)d{eQd)  <  (1  + 

x)2r[h-e-  Hence,  the  entire  path  satisfies  the  equation,  /ijj  =  Second, 

with  the  same  calculation  as  the  proof  above,  it  is  easy  to  verify  that  the  entire 

path  h%  has  a  stretch  of  at  most  (1  +  x)  given  that  d{sd)  >  (1  —  (|))£p^^“^^  and 

Kot  ^  (1  +'^)d{eod)  <  (1  The  algorithm  switches  trees  for  at  most 

(/i —  .£+ 1)  times.  □ 

Corollary  3.1.  For  all  t  such  that  d{s,t)  <  £p^,  pt^lh  stretch  is  1. 

Proof.  Node  5  has  a  routing  table  entry  for  all  t  such  that  d{sf)  <  sp^,  since 
B(5,d(5,t))  C  B(5,sp^)  is  fully  contained  in  some  level-.^  cluster  C^{s)  G  Yl^p  in 
some  tree  Tj,  and  C'{t)  is  Co(f)  G  the  base  case  of  the  above  proof  shows  that 
path  stretch  is  1.  □ 
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Part  II:  Edge  Disjoint  Paths 
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4  Edge-disjoint  Paths  in  Moderately 
Connected  Graphs 


In  the  next  three  ehapters,  we  prove  the  following  theorem  regarding  undireeted 
EDR 

Theorem  4.1.  There  is  a  polylog  n-approximation  algorithm  for  the  edge  disjoint 
path  problem  in  a  general  graph  ^  with  minimum  cut  and  node  degree  fl(log^  n). 

4.1  The  Approach 

We  begin  with  a  fraetional  relaxation  of  the  problem,  where  eaeh  terminal  pair  ean 
route  a  real-valued  amount  of  flow  between  0  and  1,  and  this  flow  ean  be  split 
fractionally  across  a  set  of  distinct  paths.  This  can  be  expressed  as  an  LP  and  can 
be  solved  efficiently.  We  denote  the  value  of  an  optimal  fractional  LP  solution  as 
OPT*.  Our  algorithm  routes  a  polylogarithmic  fraction  of  this  value  using  integral 
edge-disjoint  paths. 

The  algorithm  proceeds  by  decomposing  the  graph  into  well-connected  sub¬ 
graphs,  based  on  OPT*,  so  that  a  subset  of  the  terminal  pairs,  that  remain  within 
each  subgraph  are  “well-connected”,  following  a  decomposition  procedure  of  from 
Chekuri  et  al.  [2005].  Then,  for  each  well  connected  subgraph  G,  we  construct  an 
expander  graph  that  can  be  embedded  into  G  using  its  terminal  set.  We  use  a  result 
by  Khandekar,  Rao  and  Vazirani  in  Khandekar  et  al.  [2006],  where  they  show  that 
one  can  build  an  expander  graph  H  on  a  set  of  nodes  V  by  constructing  0(log^n) 
perfect  matchings  Mi, . . .  ,ll^o(iog^n)  between  O(log^n)  sets  of  equal  partitions  of 
V  in  an  iterative  manner. 
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Our  contribution  along  this  line  is  to  route  each  perfect  matching  on 

one  of  the  0{log^n)  (edge-disjoint)  subgraphs  of  G.  The  “splitting  procedure”, 
motivated  by  Karger’s  theorem  Karger  [1994],  simply  assigns  edges  of  G  uniformly 
at  random  into  0{log^n)  subgraphs.  Using  Karger’s  arguments,  we  show  that  all 
cuts  in  each  subgraph  have  approximately  the  correct  size  with  high  probability. 
Here  we  crucially  use  the  polylogarithmic  lower  bound  on  the  min-cut.  We  then 
route  each  matching  Mt  on  a  unique  split  subgraph  using  a  max-flow  computation 
with  unit  capacities.  Thus,  we  can  route  all  O(log^n)  matchings  edge  disjointly  in 
G  and  embed  an  expander  graph  H  integrally  with  congestion  1  on  G. 

After  we  construct  such  an  expander  graph  H  for  each  G,  we  route  terminal 
pairs  in  H  greedily  via  short  paths.  This  is  effective  since  there  are  plenty  of  short 
disjoint  paths  in  an  expander  graph  Broder  et  al.  [1994];  Kleinberg  and  Rubinfeld 
[1996].  Since  a  node  in  H  maps  to  a  cluster  of  nodes  in  G  that  is  connected  by  a 
spanning  tree,  we  put  a  capacity  constraint  on  V{H):  we  allow  only  a  single  path 
to  go  through  each  node.  We  greedily  connect  a  pair  of  terminals  from  G  via  a  path 
in  H  while  taking  both  nodes  and  edges  along  the  chosen  path  away  from  H,  until 
no  short  paths  remain  between  any  unrouted  terminal  pair.  For  the  pairs  we  indeed 
route,  we  know  the  congestion  is  1  in  the  original  graph  G,  since  we  use  each 
edge  and  node  in  H  only  once,  and  edges  and  nodes  of  H  correspond  to  disjoint 
paths  of  G.  We  use  a  lemma  in  Garg  et  al.  [1993]  to  show  that  such  a  greedy 
method  ensures  that  we  route  a  sufficiently  large  number  of  such  pairs;  We  note 
that  this  method  was  proposed  but  analyzed  somewhat  differently  by  Kleinberg 
and  Rubinfeld  [1996].  Our  analysis  is  more  like  that  of  Obata  [2004],  and  yields 
somewhat  stronger  bounds.  Our  approximation  factor  is  0(log'*’n).  (A  breakdown 
of  this  factor  is  described  in  Theorem  6.2.) 

4.1.1  Related  Work 

Much  of  recent  work  on  EDP  has  focused  on  understanding  the  polynomial-time 
approximability  of  the  problem.  Previously,  constant  or  polylogarithmic  approx¬ 
imation  algorithms  were  known  for  trees  with  parallel  edges  Garg  et  al.  [1993], 
expanders  Kleinberg  and  Rubinfeld  [1996];  Kolman  and  Scheideler  [2001],  grids 
and  grid-like  graphs  Aumann  and  Rabani  [1995];  Awerbuch  et  al.  [1994];  Klein- 
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berg  and  Tardos  [1995a, b],  and  even-degree  planar  graphs  Kleinberg  [2005]. 
For  general  graphs,  the  best  approximation  ratio  for  EDP  in  directed  graphs  is 
(9(min(n^/^,-y/m))  Chekuri  and  Khanna  [2003];  Kleinberg  [1996];  Kolliopoulos 
and  Stein  [1998];  Srinivasan  [1997];  Varadarajan  and  Venkataraman  [2004],  where 
m  denotes  number  of  edges  in  the  input  graph.  This  is  matched  by  the 
hardness  of  approximation  result  by  Guruswami  et  al  Guruswami  et  al.  [1999]. 
For  undirected  and  directed  acyclic  graphs,  the  upper  bound  has  been  improved 
to  0{-s/n)  Chekuri  et  al.  [2006b].  For  even-degree  planar  graphs,  an  O(log^n)- 
approximation  Kleinberg  [2005]  is  obtained  recently. 

A  variant  is  the  EDP  with  Congestion  (EDPwC)  problem,  where  the  goal 
is  to  route  as  many  terminals  as  possible,  such  that  at  most  co  demands  can  go 
through  any  edge  in  the  graph.  Eor  EDPwC  on  planar  graphs,  for  (O  =  2  and  4, 
f?(logn)  Chekuri  et  al.  [2004b,  2005]  and  constant  Chekuri  et  al.  [2006a]  approx¬ 
imations  have  been  obtained  respectively.  Eor  undirected  graphs,  the  hardness  re¬ 
sults  Andrews  et  al.  [2005]  are  Q(log^/^^®n)  for  EDP  and  n)  for 

EDPwC. 

A  closely  related  problem  is  the  congestion  minimization  problem:  Given  a 
graph  and  a  set  of  terminal  pairs,  connect  all  pairs  with  integral  paths  while  mini¬ 
mizing  the  maximum  number  of  paths  through  any  edge.  Raghavan  and  Thompson 
[1987]  show  that  by  applying  a  randomized  rounding  to  a  linear  relaxation  of  the 
problem  one  obtains  an  0(logn/loglogn)  approximation  for  both  directed  and 
undirected  graphs.  Eor  hardness  of  approximation,  Andrews  and  Zhang  [2005a] 
show  a  result  of  fl((loglog'^®m))  for  undirected  and  an  almost-tight  result  An¬ 
drews  and  Zhang  [2006]  of  D(log^“®m)  for  directed  graphs,  improving  that  of 
D(loglogm)  by  Chuzhoy  and  Naor  Chuzhoy  and  Naor  [2004].  Einally,  the  All-or- 
Nothing  Elow  (ANE)  problem  Chekuri  et  al.  [2004a,  2005]  is  to  choose  a  subset  of 
terminal  pairs  such  that  for  each  chosen  pair,  one  can  fractionally  route  a  unit  of 
flow  for  all  the  chosen  pairs.  The  hardness  result  for  ANE  and  ANE  with  Conges¬ 
tion  is  the  same  as  that  of  EDP  and  EDPwC  Andrews  et  al.  [2005].  Currently,  there 
exists  an  f?(log^n)  Chekuri  et  al.  [2005]  approximation  for  ANE.  Indeed,  we  build 
on  the  techniques  developed  in  this  approximation  algorithm  for  ANE.  This  ratio 
directly  contributes  to  our  approximation  factor. 

We  summarize  these  results  in  Table  4. LI. 
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directed? 

Hardness  of  Approx. 

Upperbound 

MinCong 

no 

^(logfogfogn)  Andrews  and  Zhang  [2005a] 

C?(logn/loglogn)  Raghavan  and  Thompson  [1987] 

MinCong 

yes 

fl((loglog^“®/n)  Andrews  and  Zhang  [2006] 

C?(logn/loglogn)  Raghavan  and  Thompson  [1987] 

EDP 

yes 

Guruswami  et  al.  [1999] 

(9(min(n^/^,  y^)) 

Chekuri  and  Khanna  [2003];  Kleinberg  [1996] 
Kolliopoulos  and  Stein  [1998];  Srinivasan  [1997] 
Varadarajan  and  Venkataraman  [2004] 

EDP 

no 

G(log2“^n)  Andrews  et  al.  [2005] 

0{^/n)  Chekuri  et  al.  [2006b] 

EDPwC 

no 

n(logs^  n)fovw  -o{^  logfogfogn)  Andrews  et  al.  [2005] 

EDPwC 

no 

superconstant  for  w  =  Andrews  et  al.  [2005] 

ANEwC 

no 

a(logS5+T  n)  for  w  -  o(,ogiogiog„)  Andrews  et  al.  [2005] 

C?(log^n)  for  (0  =  1  Chekuri  et  al.  [2005] 

Table  4.1.  Hardness  of  approximations  and  upper  bounds. 
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4.2  Definitions  and  Preliminaries 

We  work  with  graph  G  =  (V^E)  with  unit-capacity  edges,  where  we  allow  parallel 
edges,  unless  we  specify  a  capacity  function  for  edges  explicitly.  For  a  capacitated 
graph  G  =  {V,E,c),  where  c  is  an  integer  capacity  function  on  edges,  one  can 
replace  each  edge  e  E  E  with  c{e)  parallel  edges.  For  a  cut  {S,S  =  V  \S)  in  G,  let 
5g{S),  or  simply  5(5')  when  it  is  clear,  denote  the  set  of  edges  with  exactly  one 
endpoint  in  5  in  G.  Let  cap(5,5)  =  |5g(5)|  denote  the  total  capacity  of  edges  in 
the  cut.  The  edge  expansion  of  a  cut  (5,5),  where  |5|  <  |F|  /2,  is  (|)(5)  = 

The  expansion  of  a  graph  G  is  the  minimum  expansion  over  all  cuts  in  G.  We  call 
a  graph  G  an  expander  if  its  expansion  is  at  least  a  constant. 

An  instance  of  a  routing  problem  consists  of  a  graph  §  —  {V^E)  and  a  set  of 
terminals  pairs  T  =  {{s\^ti),{s2^t2)T  ■  ■  ^{sk^tk)}-  Nodes  in  T  are  referred  to  as 
terminals.  Given  an  EDP  instance  ((tj'T)  with  k  pairs  of  terminals,  we  will  use 
the  following  LP  relaxation  as  specified  in  (4.2.1),  to  obtain  an  optimal  fractional 
solution.  Let  iP,,  V/,  denote  the  set  of  paths  joining  Si  and  t,  in  Q. 


k 


max  ^  Xi 
i=l 

-  L  fip) 

pe'Pi 

s.t. 

(4.2.1) 

= 

0,  VI  <i<k 

(4.2.2) 

p-.eep 

< 

lyeEE 

(4.2.3) 

XiJip) 

E 

[0, 1],V1  <  i  <  kyp 

(4.2.4) 

We  let  OPT*(^,T)  be  the  value  of  this  linear  program  for  the  optimal  solution  / 
of  the  LP.  In  the  text,  where  we  always  refer  to  a  single  instance,  we  primarily  use 

opr. 

Given  a  non-negative  weight  function  %\X  ^  K+  on  a  set  of  nodes  X  in  G,  we 
use  following  definitions  from  Chekuri  et  al.  [2005]. 

Definition  4,1.  (CKS2005  Chekuri  et  al.  [2005])  X  is  n-cut-linked  in  G  i/ys  such 
that  7t(5nA)  =  i:;,65nxS(-x)  <  7t(X)/2,  |5(5)|  >  7t(5nA);  Wh  also  refer  to  {G,X) 
as  a  K-cut-linked  instance. 
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Definition  4.2.  (CKS2005  Chekuri  et  al.  [2005])  A  set  X  is  %- flow -linked  in  G  if 
there  is  a  feasible  multicommodity  flow  for  the  problem  with  demand  dem{u,v)  = 
n{u)n{v) /n{X)  between  every  unordered  pair  of  terminals  u,v  ^X. 

Remark  4.1.  Note  this  is  a  product  flow  with  dem{u,v)  —  w{u)w{v),  where  w{u)  — 
Tt{u)ly/Tt{X). 

We  have  the  following  proposition  immediately  from  the  definitions  above. 

Proposition  4.1.  (CKS2005  Chekuri  et  al.  [2005])  If  a  set  X  is  ti-flow -linked  in 
G,  then  it  is  2-cut-linked.  IfX  is  K-cut-linked  in  G,  then  it  is  H l^{G)-flow-linked, 
where  P(G)  is  the  worst-case  mincut-maxflow  gap  on  product  multicommodity  flow 
instances  on  (f. 

Definition  4.3.  (CKS2005  Chekuri  et  al.  [2005])  A  set  of  nodes  X  is  well-linked 
in  G  if'iS  such  that  |5nX|  <  |X|/2,  \5{S)\  >  |5nX|. 

4.3  Decomposition  of  the  Input  Instance 

In  this  seetion,  we  first  present  Theorem  4.2  regarding  a  preproeessing  phase  of  our 
algorithm  that  deeomposes  and  proeesses  (^,‘T)  into  a  eolleetion  of  eut-linked  in- 
stanees  with  a  min-eut  f2(log^  n)  in  eaeh  subgraph.  We  then  state  our  main  theorem 
with  a  breakdown  of  the  polylog  n  approximation  faetor.  Finally,  we  give  an  out¬ 
line  on  how  we  route  terminal  pairs  in  eaeh  eut-linked  instanee  (G,r);  Note  that 
we  use  G  to  refer  to  a  subgraph  that  we  obtain  through  Theorem  4.2  starting  from 
Seetion  6.1  till  the  end  of  the  paper,  while  ^  refers  to  the  original  input  graph.  We 
first  speeify  the  following  parameters. 

-  Parameters  related  to  original  EDP  instance 

-  (Olog  n  is  the  number  of  matehings  as  in  Figure  6.1.1; 

-  min-eut  K  =  fl(log^n)  =  »+i)^  where  s  <  1; 

-  P(^)  =  0{\ogn):  as  in  Proposition  4.1  for 

-  )i{n)  =  10P(^)logOPT*(^,T)  =  0{log^n):  as  introdueed  in  Theo¬ 
rem  5.1  in  Chekuri  et  al.  [2005]. 
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Theorem  4.2.  There  is  a  polynomial  time  decomposition  algorithm,  that  given  an 
EDP  instance  where  (j  has  a  min-cut  of  size  Q(Klog^?i),  and  a  solution  f 

to  the  fractional  EDP  problem,  with  x,-,  V/,  being  specified  as  in  (4.2.1),  produces  a 
disjoint  set  of  subgraphs  and  a  weight  function  K'*'  on  V  {(f)  where 

(1)  there  are  (Xi,..  .,ak  such  that  Vm  in  a  subgraph  El,  k{u)  =  Y,i-.Si=u,tieH^i^i^ 
(note  that  this  implies  'isdi  G  “T,  x,-  contributes  the  same  amount  of  weight  to 
^Si)  and  n{ti)); 

(2)  the  set  of  nodes  V  {H)  in  each  subgraph  H  is  K-cut-linked  in  H; 

(3)  each  subgraph  H  has  min-cut  K  =  Q(log^  n); 

(4)  Vm  in  a  subgraph  H  s.t.  Ti{H)  >  n(log^?i),  k(u)  <  Ei:s;=u,t^EH 

(5)  and  Tt{f)  =  Q.{OPT* /^{(f)X{n)). 

The  decomposition  essentially  says  that  summing  across  all  subgraphs  G,  a 
fair  fraction  of  terminal  pairs  in  T  remain  (condition  4,  5);  indeed,  we  lose  only 
a  constant  fraction  of  the  terminal  pairs  (by  assigning  a  zero  weight  to  those  lost 
terminals)  of  T.  In  addition,  each  subgraph  G  is  well  connected  with  respect  to  X, 
the  set  of  induced  terminals  of  T  in  G,  in  the  sense  of  {G,X)  being  a  7t-cut-linked 
instance.  This  decomposition  is  essentially  the  same  as  Theorem  5.1  of  Chekuri, 
Khanna,  and  Shepherd  Chekuri  et  al.  [2005].  We  need  to  do  some  additional  work 
to  ensure  that  the  min-cut  condition  (condition  3)  holds.  We  prove  a  flow-based 
version  of  the  result  in  Section  5.1. 

In  particular,  we  sketch  a  proof  to  Theorem  5.2,  which  states  a  more  refined 
and  stronger  version  of  Theorem  4.2.  Actual  proof  of  Theorem  5.2  is  shown  in 
Section  5.3. 
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5  Obtaining  a  Cut-Linked 
Decomposition 


5.1  An  Outline  of  the  Decomposition  Procedure 

In  this  section,  we  first  sketch  a  proof  to  Theorem  5.2,  which  states  a  more  refined 
and  sfronger  version  of  Theorem  4.2.  Acfual  proof  of  Theorem  5.2  is  shown  in 
Secfion  5.3. 

We  firsf  fransform  to  a  set  of  flow-linked  instances  by  following  a  de¬ 

composition  procedure  in  CKS05  Chekuri  et  al.  [2005],  the  outcome  of  which  is 
summarized  in  the  following  theorem. 

5.1.1  The  CKS  Flow-Linked  Decomposition  Theorem 

Theorem  5.1,  (CKS2005  Chekuri  et  al,  [2005])  Let  OPT {Q,‘T)  be  a  solution  to 
the  LPfor  a  given  instance  ( T)  ofEDP  in  an  input  graph  Q.  One  can  efficiently 
compute  a  partition  of  §  into  node-disjoint  induced  subgraphs  Gi,  G2,  ■■■,  Gi, 
and  weight  functions  71 :  V{Gi)  K'*'  with  the  following  properties.  Let  %  be  the 
induced  pairs  ofT  in  Gf  and  let  Xi  be  the  set  of  terminals  off. 

(1)  71, (m)  =  %{v)for  uv  G  % 

(2)  Xi  is  %- flow -linked  in  Gu 

(3)  ULiMXi)  =  Ll{OPr{g,‘T)/X{n)),  where  X{n)  = 

io^{g)iogOPr{g,n 
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Remark  5.1.  Although  the  statement  of  condition  1  in  the  CKS  decomposition 
theorem  assumes  that  each  node  u  belongs  to  only  a  single  terminal  pair  in  ‘T, 
their  actual  proof  does  not  depend  on  such  an  assumption. 

The  proof  of  the  theorem  appears  in  CKS05  Chekuri  et  al.  [2005].  They  use 
this  proeedure  as  the  first  step  in  a  two-step  transformation  from  the  optimal  mul- 
tieommodity  flow  solutions  /  to  obtain  sets  of  well-linked  terminal  sets,  that  even¬ 
tually  leads  to  an  0(log^.^f)-approximation  for  the  ANF  problem  deseribed  in  See- 
tion  4.1,  where  K  =  |T|.  We  plaee  details  regarding  this  deeomposition  in  the  See- 
tion  5.3.  From  now  on,  we  refer  to  both  {Gi,%)  and  (Gi^Xi)  as  a  S, -flow-linked 
instanee  without  differentiation. 

5.1 .2  Processing  Subgraphs  to  Maintain  Mincut  Condition 

We  treat  the  indueed  subproblems  independently.  Given  (G,-,A,)  sueh 

that  Xi  is  71, -flow-linked  in  Gi,  there  are  two  post-proeessing  stages. 

(1)  Min-cut  processing  stage.  Formally,  let  V (G,-)  be  the  eurrent  set  of  vertiees 
of  Gi-  We  keep  eutting  off  the  smaller  side  5  of  a  minimum  eut,  in  terms  of 
weight  Hi,  from  Gi  when  cap(5',F(G,)  \S)  is  less  than  c,  until  every  eut  in 
Gi  is  at  least  c,  where  we  set  c  =  fl(log^  n). 

By  eutting  off,  we  remove  both  nodes  in  S  and  edges  that  are  adjaeent  to 
S  in  eurrent  Gf,  this  ineludes  the  eases  when  we  get  rid  of  any  single  node 
whose  degree  fall  below  c  from  its  original  degree  of  G(log^  n).  We  eall  sueh 
a  stage  a  min-cut  proeessing  stage. 

(2)  Sparsest-cut  processing  stage.  In  order  to  guarantee  that  we  have  an  in¬ 
stanee  X'  that  is  7t--flow-linked  in  Gi  for  a  new  weight  funetion  7t-,  we  need 
to  further  “mute”  some  terminals  with  a  positive  weight  under  7t  by  setting 
their  weight  to  zero  under  S-.  This  way,  we  can  guarantee  that  every  cut  in 
Gi  is  good  with  respect  to  a  product  multicommodity  flow  demand  that  is 
defined  based  on  the  new  weight  function  S'.  We  emphasize  that  we  do  not 
remove  any  nodes  or  edges  in  this  stage;  hence  the  min-cuts  are  guaranteed 
to  be  G(log^n). 
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5.1.3  A  Modified  Fiow-Linked  Decomposition  Theorem 

Therefore,  we  have  the  following  theorem  about  the  instanees  that  we  have  by  the 
end  of  this  post-proeessing  stage.  The  proof  of  this  theorem  is  in  Seetion  5.3. 

Theorem  5.2.  Given  a  graph  Q  with  min-cut  value  C°  >  (4ao?t(n)  +  ao  +  2)c,  for 
some  ao  >  2.  By  the  end  of  the  sparsest-cut  processing,  we  obtain  a  set  of  node- 
disjoint  induced  subgraphs  all  with  min-cut  c,  and  the  corresponding 

disjoint  subsets  T/, . . . ,  of  T,  such  that  terminals  pairs  in  tZi'  belong  to  Gi  and 
there  exist  a  set  of  weight  functions  Tt\ :  V  (G/)  —>■  M'*'  with  the  following  properties. 
Let  X!  be  the  set  of  terminals  ofTf 

(1)  there  are  tti , . . . ,  tt/t  such  that  Vm  in  a  subgraph  Gu  n{u)  —  Lri,=M  f.gG, 

(note  that  this  implies  'isdi  E  Xf  contributes  the  same  amount  of  weight 
to  %{si)  and  %{ti)); 

(2)  Xl  is  Tl'fflow-linked  in  G\; 

(3)  Vm  in  a  subgraph  Gi  s.t.  tifx!)  >  a(log^n),  tifu)  < 

(4)  Lh^iiX!)  =  a{OPr{g,‘T)/X{nmg)),  where  X{n)  = 
^{g)logOPV{g,T)  and  P(G)  is  the  worst-case  mincut-maxflow  gap 
on  product  multicommodity  flow  instances  on  g. 

Finally,  we  define  a  weight  funetion  on  F(  as  follows:  (a)  V/,  Vm  E  Gu  where 
Gi  is  a  subgraph  of  g,  we  assign  Ti{u)  =  7t-(M)/2;  and  (b)  assign  Ti{u)  =  0,  for 
nodes  of  V (G)  not  in  any  G,.  We  thus  have  defined  the  weight  funetion  7t :  F(G)  — >■ 
K+  on  the  entire  set  of  nodes  of  g  as  required  by  Theorem  4.2  with  the  same 
decomposition  as  we  obtain  for  Theorem  5.2. 

5.2  Details  Regarding  CKS  Flow-linked  Decompositions 

The  following  notation  appears  in  proof  of  Theorem  5. 1  as  in  Chekuri  et  al.  [2005] . 
We  will  inherit  these  in  our  proofs  in  Section  5.3.  Let  H  =  {V {H)^E{H))  be  a  node 
induced  subgraph  of  G  =  (V^E). 


-^{g)=OPV{g,n 
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-  y{H)  =  :PeH  f{P)  •  the  total  flow  induced  in  H  by  the  original  flow  /; 
it  counts  flow  only  on  flows  paths  / {P)  from  the  the  original  flow  path  de¬ 
composition  that  are  completely  contained  in  H.  P  refers  to  the  entire  set  of 
paths  from  the  original  flow  decomposition. 

-  y{u,H):  the  flow  in  H  for  u,  hence  y{H)  =  1  l2Y,uev{H) 

Recall  the  following  results  from  their  decomposition  procedure.  Let 
Gi ,  G2, . . . ,  be  the  subgraphs  produced  by  the  decomposition. 

(1)  If  y(G,)  <  X(n)/10,  assign  %{u)  =  %{v)  =  1  for  some  pair  uv  ^  %  with 
positive  flow  in  G,;  and  71, (y)  =  0  for  y  m,v.  Hence  one  can  just  route  a 
unit  flow  between  the  chosen  pair  uv  along  an  integral  path;  such  a  path 
exists  since  Gi  is  a  connected  component. 

(2)  Else,  for  y(G,)  >  X(n)/10,  X,-  is  71, -flow-linked  in  G,-,  where  71, •  is  defined  as 
follows  for  G,-;  Recall  X{n)  =  10P(^)logOPT*(^,T). 

(a)  7r,(n)  =  ^,VneX,- 

(b)  7C,(m)  =  y(“5  Gi)  =  0  for  M  ^  Xi 

Remark  5.2,  For  both  cases,  the  CKS  weight  function  on  V  (Gi)  satisfy  71,  (G,)  = 
Q{y{Gi)/X{n)),  given  that  7t,-(G,)  =  =  'Lxev{Gi]y{^^Gi) /X(n)  = 

2Y(G,  )/?i(n);  And  the  flow  that  one  route  in  Gi  satisfies  the  following  two  equiva¬ 
lent  conditions. 


(1)  Define  Vnv  G  V (Gi), 


dem^{u,v) 


Y(n,G,)Y(v,G,-) 

y(Gi) 


(5.2.1) 


as  demands  for  the  multicommodity  product  flow  problem  based  on  origi¬ 
nal  induced  flow  values  at  each  node  u  E  V{Gi)  of  f  in  Git  in  Gif^i,  the 
concurrent  max-flow  value  f  for  product  flow  dem^{u,v),  satisfy 


1 


/>/o 


2X{n) 


(5.2.2) 


5.3  An  Analysis  on  Postprocessing  to  Maintain  Cut  Conditions 


69 


Thus  /oC/em“(M,  v)  units  of  demands  can  be  simultaneously  routed  Vmv  in  Gi 
with  congestion  1. 

(2)  For  a  scaled-down  product  flow  problem  denf'‘  (m,  v),  such  that  each  demand 
is  /o  of  the  original,  Vmv  €  V  (G,  ), 


derrf‘{u,v) 


UXi) 


y{u,Gi)y{v,Gi) 

2X{n)y{Gi) 


derrf{u,v) 

2X{n) 


=  fodem^{u,v), 

(5.2.3) 


there  is  is  a  feasible  flow  in  Gi  since  the  concurrent  max-flow  value  is  at  least 

1. 


Depending  on  the  eontext,  we  may  prefer  to  use  the  original  produet  flow 
dem“(M,v)  than  the  feasible  produet  flow  dem’''(M,v),  or  the  other  way  around. 


5.3  An  Analysis  on  Postprocessing  to  Maintain  Cut  Conditions 

The  analysis  of  this  seetion  will  lead  to  the  proof  of  Theorem  5.2  eventually. 
Throughout  this  seetion,  we  keep  redueing  the  set  of  terminals  pairs  of  %  that 
are  relevant,  in  the  sense  that  these  pairs  will  remain  to  be  eandidate  pairs  that  we 
eventually  route  edge  disjointly  in  Q.  Therefore,  we  keep  traek  of  the  following  set 
of  parameters  in  eaeh  subgraph  Gi  that  we  obtain  through  flow  deeomposition: 

-  %:  the  indueed  pairs  of  T  in  Gi  that  we  still  eonsider  to  route  edge  disjointly. 

-  A  weight  funetion  7t,-  defined  on  the  V{Gi),  with  positive  values  only  on 
terminals  A,  of  %. 

Finally,  we  use  remaining-flow  to  keep  traek  of  the  total  remaining  flows  of  / 
between  terminal  pairs  in  %,  aeross  all  /;  note  that  remaining-flow  is  the  lower 
bound  on 

By  the  end  of  the  CKS  flow  deeomposition,  %  is  the  indueed  pairs  of  T  in 
Gi-  There  exists  at  least  one  flow  path  between  a  pair  of  terminals  uv  E  Ti,  with  a 
positive  amount  of  flow,  from  original  flow  path  deeomposition  of  /  that  is  entirely 
eontained  in  G,-.  We  lose  at  most  half  of  /,  where  |/|  =  OPT*(  T),  beeause  the 
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number  of  edges  that  were  eut  during  flow  deeomposition  is  at  most  OPT*/2  = 
Y(G)/2;  henee 

l 

remaining-flow  >  ^Y(G;)  >  0PT*(^,T)/2  =  y(G)/2,  (5.3.4) 

and  the  total  amount  of  the  weights  aeross  all  elusters  is  at  least: 

f  =  £2y(G;)A(«)  =  f2(OPr(^,'r)AW)-  (5.3.5) 

i=l  i=  1 

We  are  going  to  keep  eomputing  the  original  flows  of  /  that  we  lose  during  the 
post-proeessing  stages. 

We  speeify  the  following  parameters  that  are  related  to  minimum  euts: 

(1)  c:  the  smallest  minimum  eut  value  that  we  allow  in  G,-,  V/,  whieh  is  0(log^  n). 

(2)  C^:  the  minimum  eut  value  in  original  graph  whieh  is  fl(log^  n). 

(3)  i{S)  =  cap(5',  V\5'):  size  of  a  eut  {S,V\S)  in  original  graph  G  =  {V,E). 

(4)  LOSS  <  OPT*  ( ^,  “T) /2:  number  of  edges  that  are  eut  during  the  CKS  flow- 
decomposition  process. 

We  analyze  the  minimum  cut  processing  stage  in  the  next  two  sections.  For¬ 
mally,  let  V (Gi)  be  the  current  set  of  vertices  of  Gi.  We  keep  cutting  off  the  smaller 
side  5  of  a  minimum  cut,  in  terms  of  weight  %,  from  Gi  when  cap(5',  V  (G,)  \  S)  is 
less  than  c,  until  every  cut  in  Gi  is  at  least  c.  By  cutting  off,  we  remove  both  nodes 
in  S  and  edges  that  are  adjacent  to  S  in  current  Gi. 

Let  5/ ,  5?, . . . ,  be  the  sets  of  vertices  that  we  take  away  from  Gi  and  in  that 
order.  We  define  the  following  notation  to  track  this  process  of  updating  Gi. 

-  G^i  =  the  subgraph  G,  before  any  of  =  1, . . .  have  been  take 

out. 

-  the  set  of  terminals  of  Gf  right  after  flow  decomposition,  such  that  is 
Tti-flow-linked  in  Gf  as  guaranteed  by  CKS  decomposition. 
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-  G\  —  {Vf^ED^yt  —  the  remaining  subgraph  of  after  removing 

S'- , . . . ,  S'-  and  their  adjaeent  edges;  henee  Vf  =  \  U 

-  Gi  =  {Vi,Ei)  =  Gf  =  be  the  remaining  subgraph  of  by  the  end 

of  the  min-eut  proeessing  stage. 

5.3.1  Bound  Edges  Lost  Due  to  Min-Cut  Processing 

Denote  the  number  of  edges  that  we  take  away  from  G?  due  to  the  min-eut  pro¬ 
eessing  by  edge-ioss,-,V/. 

Definition  5.1.  edge-loss^  is  the  sum  of  capacities  of  the  minimum  cuts  that  have 
caused  S/ , . . . ,  S^'  to  be  cut  off  from  Gi^  V/.  Denote  the  sum  of  edge-loss^  across  all 
i  with  edge-loss, 

edge-loss  =  ^  edge-losSi=  ^  ^  cap{S\,V‘). 

Remark  5.3.  Note  that  the  number  of  edges  that  we  take  away  from  the  final  set  of 
nodes  V (G,)  =  Vf  =  \  during  the  min-cut  processing  stage  is  upper 

bounded,  and  in  fact  may  be  smaller  than  edge-losSjffi. 

We  prove  the  following  lemma  in  this  seetion. 

Lemma  5.1.  The  total  number  of  edges  that  we  take  away  from  decomposed  sub¬ 
graphs  G-*,  G  - , . . .  is  at  most 

^  2LOSS-C 

edge-loss  =  edge-losSi<  .  (5.3.6) 

Proof  We  use  a  potential  funetion  \|/(G,)  to  eount  the  number  of  edges  we  lose 
from  nodes  eurrently  in  Gi,  as  eompared  to  the  original  graph  G  ={V,E),  while  Gi 
keeps  shrinking  due  to  its  min-eut  proeessing.  The  eounting  proeess  is  as  follows. 
We  start  with  a  eomponent  G,  sueh  that  \|/|1  =  LOSS,  denotes  the  number  of  edges 
that  we  initially  lose  from  nodes  in  G?  right  after  the  CKS  flow  deeomposition 
procedure.  Hence 


=  t^(Gl-’)  =  LOSS,-  >  0, 


(5.3.7) 
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and 


^  LOSS,  =  2LOSS.  (5.3.8) 

When  a  subset  S  is  eut  off,  it  elaims  away  some  eredit  from  the  eurrent  X|/(G,),  sinee 
S  is  eut  off  beeause  cap(5',  V\5')  has  deereased  from  above  to  its  eurrent  size 
in  Gi,  cap(5',  V(G,)  \S)  <  c  due  to  edges  lost  from  nodes  in  S  during  CKS  flow 
decomposition.  That  is,  the  amount  of  edge  loss  from  nodes  in  S  has  contributed  to 
the  current  value  of  \|/(G,). 

Let  be  value  of  \|/(G,)  after  taking  t  sets  of  vertices  5/,..., S'-  and  their 
adjacent  edges  away  from  G,-.  Let  be  the  minimum  cut  in  G-, 

and  hence  be  the  (f  + 1)'*'  set  of  vertices  that  we  cut  off  from  G,  because 
cap(S-"'“^,  \  is  less  than  c.  The  amount  of  credit  takes  away  from 

\|/(G,)  is  (cap(S^+SL\Sj+^) -cap(Sj+\\Af\5'+i))  and  the  credit  it  puts  back  is 
cap(S-'''^,  since  we  remove  edges  in  from  G\,  in  addition 

to  the  subgraph  induced  by  in  G\. 

Let  us  denote  the  size  of  the  original  cut  (7  with 

^'+1  =cap(S5+\L\S|+i)  >  G°.  (5.3.9) 

Hence,  we  update  t|/(G,)  as  follows, 

=  v^-(cap(S'+Sl/\s:+i)-cap(S'+SL/\s:+i))+cap(S'+Sl/.^\s:+') 
=  ^  _  (^'+1  -cap(S'+\l//+i))  +  cap(s:+i,l/.'+i). 

Since  cap(S-‘''\ <  c,  we  have 

<¥^-(^r^-c)  +  c.  (5.3.10) 


Since  the  credit  that  a  cut  puts  back  is  much  less  than  the  credit  that  it  spent,  there 
is  only  finite  number  v,  of  such  small  cuts  in  G,,  V/.  By  the  end  of  v,  rounds,  there 
must  be  a  non-negative  credit  in  t|/(G,),  since  nodes  in  current  G,  can  never  gain 
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any  edges.  Henee 

0  <  =  \^  <  LOSS; -{£] -c)+c-{£j-c)+c-...- -  c)  +c. 


Summing  the  above  inequalities  over  all  i. 


I  Xi-C^<  £  £  ^/<2-LOSS  +  2  £ 


Xj  ■  C. 


i=\,2,... 

Henee  the  total  number  of  minimum  euts  aeross  all  G,  that  we  proeess  is 

^  2LOSS 

;=i.2,.. 


G°-2c' 

Denote  the  sum  of  edge-loss,  aeross  all  i  with  edge-loss  and  thus 


(5.3.11) 


edge-loss  = 


< 


< 


^  edge-loss, 

(5.3.12) 

(=1.2,... 

£  £  cap{sfv‘) 

(5.3.13) 

£  xi-c 

(5.3.14) 

(=1.2.... 

2LOSS-C 

C^-2c  ■ 

(5.3.15) 

□ 


5.3.2  Bound  the  Flow  Lost  Due  to  Min-Cut  Processing 

Lemma  5,2,  The  total  flow  of  f  that  we  lose  from  min-cut  processing  is 

flow-lossi  <  — ^(2X(n)  -H  1/2).  (5.3.16) 

Proof  For  a  set  of  nodes  S\  G  Vt  =  1, . . .  ,x;,  in  G/  =  {V^^Ef),  we  denote  the 
size  of  eut  {S\,Vj^\S\)  with  =  cap(5'-,l/‘’ \  S-).  determines  the  amount  of 
flow  of  /  that  we  take  away  from  y(G,)  as  we  remove  S\  from  Gi  as  the  smaller 
side  of  a  min-cut  {S\,Vl)  in  G'“L 
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A  closer  examination  of  the  above  cutting  process  shows  that 

^  <  2edge-loss,-,  (5.3.17) 

and 

Y,  Y  <  2edge-loss,  (5.3.18) 

i — 1,2,...  t — 1,2,...  yXj 

since  the  edges  in  “Sf  come  either  from  previous  min-cuts:  {{Sj,V/)yj  <  t},  or 
from  new  edges  that  contribute  to  {(S'-jV/)};  in  addition,  each  edge  e  counted  in 
edge-loss,-  can  be  used  at  most  twice  toward  (Bj,  once  for  each  of  the  two 
neighboring  sets  in  {S'-,?  =  1, . . .  ,x,}  that  share  e  G  G^. 

Hence  fix  “Sf  for  some  t.  We  now  calculate  the  amount  of  flows  of  /  that  we 
lose  by  cutting  off  S-.  The  flow  that  we  lose  falls  into  one  of  the  four  types: 

(1)  flow  whose  paths  are  entirely  contained  in  the  subgraph  of  G,-  induced  by  S-; 

(2)  flow  that  has  to  go  through  edges  that  are  counted  in  “Sf,  but  not  counted  in 

is\,vn-, 

(3)  flow  that  has  to  cross  (S-,  Vj-^)  with  at  least  one  endpoint  in  S- ; 

(4)  flow  with  both  endpoints  u'V  €  T/  such  that  the  flow  path  intersects  the  min- 
cut  {S‘i,V[)  at  least  twice. 

Flow  of  type  1  is  counted  in  twice.  Flow  of  type  2  has  been 

counted  before  when  S/  were  cut  off  for  some  j  <  t.  Flow  of  type  3  contributes  its 
flow  amount  once  to  once  to  the  usage  of  cap(5'-,F/).  Flow  of 

type  4  are  counted  twice  in  the  usage  of  cap(5'-,  Vj^). 

Note  that  flow  that  crosses  cut  (5-,  either  has  been  counted  in  Y,ues'  t7,) 
at  least  once  or  it  crosses  {S\,V[)  at  least  twice.  Hence  l/2(l^„gs' Y(m,G,)  + 
cap(5'-,  Vj-'))  upper  bounds  the  amount  of  flow  that  we  lose  from  /,  that  has  not 
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been  eounted  earlier,  due  to  eutting  off  indueed  subgraph  of  S\  from  G\  ^ : 

\  L  +  ^cap(5',l//)  <  nx°)X(n)  +  ^cap{S\,V[) 

ueS' 

<  cap(5: , \ S\)X{n)  +  ^cap{S\ , V‘) 

<  ^lX{n)  +  ^cap{S\X‘), 

where  seeond  inequality  is  due  to  the  faet  that  XP  is  7t, -flow-linked  and  Proposi¬ 
tion  4.1,  whieh  implies  that  XP  is  7i,/2-eut-linked  in  gP. 

Sum  over  all  Sj,  Vi,  we  obtain  the  total  flow  lost: 

flow-lossi  =  £  £  ((BlX(n)-h^cap(Slvf))  (5.3.19) 

1=1,2,...  t=l,...,Xi 

<  2edge-loss  •  X(n)  +  ^edge-loss  (5.3.20) 

<  ^^^(2Mu)  +  l/2).  (5.3.21) 

□ 


Let  1  /ao  denote  the  ratio  of  amount  of  flow  of  /  that  we  lose  during  min -cut 
processing  with  respect  to  LOSS  in  CKS  flow  decomposition: 


flOW-IOSSi  1 

<  — 


LOSS 


ao 


(5.3.22) 


Thus  we  require 

flow-loSSi  ^  (2X(n)  +  l/2)-2c  (4X(n)  +  l)-c  ^  1 

LOSS  -  CO -2c  “  CO -2c  “ 

Given  an  ao,  in  order  to  satisfy  (5.3.23),  we  require 


C*’  >  (4aoX(n)  +  ao  +  2)  •  c. 


(5.3.24) 
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Plugging  (5.3.24)  in  (5.3.12),  we  obtain  the  following  bound  on  edge  loss  due  to 
post-proeessing  of  Gf. 


edge-loss  < 


2LOSS-C  2LOSS-C 

C^  —  2c  “  ao(4)i(n) -|- 1)  •  c 


LOSS 

ao{2X{n)  +  ^) 


(5.3.25) 


5.3.3  Obtain  the  Final  Set  of  Terminals 

Reeall  that  denote  the  subgraph  G,  we  obtain  through  CKS  flow 

deeomposition  before  any  subset  of  nodes  have  been  removed;  G,-  =  (V)-,£, •),)// 
are  the  remaining  subgraphs  of  G,,V/  at  the  end  of  the  min-eut  proeessing  stage. 
By  (5.3.23),  the  total  flow  of  /  that  remains  is  the  sum  of  flow  of  /  indueed  in  G;, 
aeross  all  i. 


remaining-flow  =  ^  Y(G,)  (5.3.26) 

opT(g  t) 

>  - 2y^_flow-losSi  (5.3.27) 

>  lopr(^,'r)(i-l),  (5.3.28) 

z  ao 

where  flow-loss i  =  LOSS/ao  and  LOSS  <  OPT*((^,T)/2. 

In  the  sparsest-eut  proeessing,  we  remove  from  the  graph  G,-  that 

do  not  meet  a  eertain  sparsest  eut  eondition.  In  the  end,  we  have  a  subgraph  G  ■  that 
does  meet  the  sparsest  eut  eondition  on  the  demands  in  the  remaining  subgraph. 
Now  we  assign  a  zero  weight  to  all  vertiees  in  the  removed  regions  to  zero  out 
demands  on  these  regions  and  put  ,  ■  •  •  all  baek  in.  This  graph  G,  is  only 
more  eonneeted  with  regard  to  the  non  removed  demand  indueed  by  /  inside  G ■,)//. 
Henee  we  emphasize  that  G,-,  V/  =  \  are  the  set  of  subgraphs  that  we  pass  on 
to  the  next  stage.  We  give  an  algorithm  for  eomputing  the  final  disjoint  subsets 
.  ,‘T/  of  T  sueh  that  terminal  pairs  in  11'  belong  to  G-,  and  henee  Gi^i,  and 
assigning  a  positive  weight  to  the  set  of  terminals  in  1^1 
In  the  rest  of  this  seetion,  we  prove  Theorem  5.2. 

Proof  of  Theorem  5.2:  Given  a  subgraph  G,  =  (V)-,£,),  we  use  the  proeedure  as 
in  Figure  5.3.3  to  update  G,  reeursively  by  muting  regions  that  do  not  satisfy  the 
sparsest  eut  eondition;  by  “muting”  a  region  P,  we  treat  nodes  in  P  and  their  ad- 
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0.  Given  a  subgraph  G,-. 

1.  If  y(G,)  <  (ai/4)pX(n),  =  7l-(v)  =  1  for  some  pair  uv  E  %' 

with  positive  flow  in  G,;  and  7l-(y)  =  0  for  y  7^  n,  v. 

Henee  we  ean  just  route  a  unit  flow  between  the  ehosen  pair  uv  E  %' 
along  an  integral  path;  sueh  a  path  exists  sinee  G,  is  a  eonneeted  eomponent. 

2.  Suppose  that  y(G,)  >  (ai/4)PX(n).  For  dem(M,v)  =  y(w,G,)y(v,G,)/y(G;), 
let  f  be  the  maximum  eoneurrent  flow  for  this  instanee. 

(a)  if  f  >fl,  set  71'(m)  =  (afil^Un)  ^ 

(b)  else  f  <  f\,  find  an  approximate  sparsest  eut  sueh  that  <  P/^- 

set  71 -(m)  =  0,  Vm  E  S,  and 

shut  off  edges  in  5°(5')  =  (S,  Vj  \  S) 

so  that  we  reeurse  on  G,  [F  (G,)  \  S]. 

3.  End 


Figure  5.3.1.  Algorithm  FINDING  SPARSEST  CUTS 


jaeent  edges  as  if  they  were  removed  from  G,-  during  the  sparsest-eut  proeessing 
stage,  although  in  the  end,  we  retain  these  regions  entirely  in  G,.  We  define  the  fol¬ 
lowing  parameters  given  a  remaining  subgraph  G]  of  G‘  after  muting  some  regions, 

pi  pt—l 

(1)  G\  =  the  remaining  subgraph  of  G,  after  muting  nodes  in  F/,  .  ..,Pl 

and  their  adjaeent  edges.  Vf  —  Vi  \  U is  the  remaining  set  of  vertiees 
in  Gi  at  stage  t. 

(2)  5' (5)  =  cap(5',  Vf  \  S)  denotes  the  size  of  eut  {S,  Vf  \  S)  in  subgraph  Gj. 

(3)  A(5')  =  cap(5',  \  S)  denotes  the  size  of  eut  {S,  \  S)  in  subgraph  G°. 

Given  Gj,  we  try  to  route  the  following  multieommodity  produet  flow  between 
any  unordered  pair  of  vertiees  m,v: 


dem'(M,v) 


y{u,G\)y{v,G\) 
li&i)  ' 


(5.3.29) 


where  y{u,G‘j)  is  the  flow  of  /  at  node  u  E  V-  that  is  indueed  in  Gj. 
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We  define  fi  —  aiP{G)X{n)’  where  ai  >  8,  as  the  minimum  eoneurrent  flow  value 
that  one  needs  to  obtain  for  dem'(M,v)  in  order  for  subgraph  G]  to  satisfy  the  flow- 
linked  property.  When  the  aetual  flow  value  f  <  fi,  we  ean  find  a  set  sueh 
that  <  dem^(Pj+\ Vjt  We  say  Pj+^  does  not  meet  the  sparsest 

eut  eondition  for  the  demands  dem^(M,v)  in  subgraph  G-,  and  we  mute  P-“'“^  in 
G\  and  reeurse  on  G,[t7/  \P[+^]  =  When  the  flow  value  f'>fi,  we  stop  the 

reeursion,  and  assign  %{u)  —  for  all  u  E  V- . 

Let  G-  =  GJ‘  =  {Vl‘ ,Ef‘)  be  the  remaining  subgraph  of  G,  by  end  of  sparsest- 
eut  proeessing  after  muting  nodes  in  P/, . . .  ,P?‘  and  their  adjaeent  edges.  Let  the 
set  of  terminal  pairs  ‘II  be  the  subset  of  %  that  are  eontained  in  subgraph  G\  and 
let  Xf  be  the  set  of  terminals  of  P‘/. 

If  y{G'i)  <  (ai/4)PX(n)  when  the  algorithm  terminates,  we  have  obtained  a  ter¬ 
minal  pair  PI'  to  route  in  G'  and  a  weight  assignment  that  satisfy  all  three  eonditions 
in  the  theorem.  And  we  are  done  with  this  subgraph. 

When  y{G'i)  >  (ai/4)pX(n),  a  produet  flow  based  on  the  flow  of  /  indueed  in 
Gj-  is  routable  with  throughput  at  least  /i  =  where  ai  >  8.  Henee 

by  assigning  a  new  weight 


yjuA) 

(ai/2)PMn)’ 


(5.3.30) 


for  all  M  e  L {G'i),  and  7t-(M)  =  0  for  all  other  nodes  u  E  Vi  in  G,,  we  ean  define  a  mul- 
tieommodity  flow  problem,  where  for  any  unordered  pair  of  vertiees  m,v  E  V{G'i), 
dem*'(M,v)  =  7i'(m)7i'(v)/7I-(A/),  that  is  feasible  in  both  G'  and  G,-.  Henee  X!  is 
S'-flow-linked  in  Gi,yi.  Finally,  we  put  P/,Vf  baek  in  G,  with  zero  node  weight, 
while  retaining  the  same  weight  assignment  for  nodes  in  G  ■.  Henee  the  sum  of  the 
total  weight  is: 


S'(G,)=7i'(G') 


y,  y{u,G'i) 

ueC,  {ai/WKn) 


y(g:) 

(ai/4)pX(n)  ■ 


(5.3.31) 


Henee  for  both  terminating  eonditions  of  the  algorithm,  we  have 


S'(G:)>Y(G')(«i/4)PM«)- 


(5.3.32) 
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Hence 


i=l 


i=l 


(ai/4)PX(n) 


^  opr(^,>r) 


where  Yi=\l{G'i)  >  OPT*(^,T)/4  by  taking  ao  —  4  and  ai  =  16  in  Lemma  5.3 
and  requiring  that  >  16X(n)c.  □ 

Hence  by  the  end  of  sparsest-cut  processing,  we  get  a  new  instance  Xf  on  G,-  = 
{Vi,Ei)  with  min-cut  at  least  c  =  fl(log^n),  such  that  X-  is  7l--flow-linked  in  G,, 
which  can  be  only  more  connected  than  G-.  We  tune  two  parameters:  ao  and  ai,  to 
balance  the  the  initial  node  degree  requirement  and  the  amount  of  flow  of  /  that 
we  retain  by  the  end  of  min-cut  and  sparsest-cut  processing  stages. 

Lemma  5.3.  Given  a  graph  §  with  min-cut  value  C°  >  {4ao'k{n)  -f  ao  +  2)c,  where 
ao  >  2.  By  the  end  of  sparsest-cut  processing,  the  total  amount  of  flow  of  f  that  we 
will  pass  on  to  next  stage  of  the  algorithm  for  finding  EDP  in  Q  is  the  sum  of  flow 
of  f  induced  in  G-,  across  all  i, 


I  y(g')  >^opr(^,'r) 

;=i.2,..  ^ 


1 

ao 


2ao(l  —  8/ai) 


(5.3.33) 


where  ai  >  8. 


Proof  In  the  beginning  of  the  sparsest-cut  processing  stage,  we  have 


remaining-flow  =  ^  Y(G,) 

;=i.2.... 

>  lopr(^,'r)(i-l). 

2  ao 


(5.3.34) 

(5.3.35) 


Combine  this  initial  condition  with  Lemma  5.4,  we  have 

remaining-flow  =  V  flG'i)>\oPV{g,E){l- - 

;=i,2,...  ^  \ 


1 


ao  2ao(l  —  8/ai)  J  ’ 
(5.3.36) 


where  LOSS  <  OPr(^,'r)/2. 


□ 
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Lemma  5.4.  The  amount  of  flow  that  we  lose  from  Y(^()  due  to  sparsest- 

cut processing  is  flow-losSj  <  2ao(?-f/ai ) ’  ^dere  ai  >  8. 

Proof  To  analyze  the  amount  of  flow  that  we  lose  from  sparsest-eut  proeessing, 
we  use  a  potential  funetion  cp(G,)  to  keep  traek  of  the  edges  of  that 

we  take  away  from  nodes  eurrently  in  G,,  after  min-eut  and  during  sparsest-eut 
proeessing.  Note  that  those  lost  edges  eonneet  to  other  nodes  in  from  nodes 
internal  to  G\  at  stage  t. 

The  eounting  proeess  is  the  following.  We  start  with  a  eomponent  Gi  sueh  that 
some  nodes  in  G,  have  lost  some  of  their  edges  right  after  min-eut  proeessing  and 

(p°  =  edge-loss,  >  0. 

Let  cp-  be  value  of  cp(G,)  after  removing  t  sets  of  vertiees  P/, . and  their 
adjaeent  edges  from  G,.  Let  Pl'^^  be  the  {t -\- 1)*^  set  of  vertiees  that  we  shut  off 
from  Gi  beeause  internal  boundary  eapaeity  of  Pl^^  has  deereased  from  to 

5'(p/+i)  <  dem'(p;+\t//\p;+i)p/. 

We  update  cp(G,)  as  the  following, 

<p'+'=<p'-{A{/'+')-5'(p;+'))+5'{/;+'). 

Sinee  the  eredit  that  a  eut  puts  baek  is  less  than  the  eredit  that  it  spent,  there  is 
a  only  finite  number  y,  of  sueh  small  euts.  By  the  end  of  y,  rounds,  there  must  be 
non-negative  eredit  in  (p(G,),  sinee  nodes  in  eurrent  G,-  ean  never  gain  any  internal 
edges: 

(p(G0  =  cp>‘ 

=  edge-loss,  -  (A(P,')  -  5°(P,^))  +  h^P})  -  {A{Pf)  -  5\pf))  +  5\pf) 
- ...  -  (A(Pf )  -  S»'--i(Pf ))  +  S'--i(Pf-) 

>  0. 
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Henee  by  summing  above  inequalities  over  all  i, 

£  £  {A(p/)-25^-'{P/))  < 

^  edge-loss, 

(5.3.37) 

i=l,2,- 

= 

edge-loss 

(5.3.38) 

< 

LOSS 

ao{2X{n)  +  i) 

(5.3.39) 

Fix  for  some  /,f  G  [0, ...  ,y,  —  1],  we  have  the  following  two  lemmas  on 
A(P/+^)  and  5(p/+^). 

Lemma  5.5.  For  all  i  and  all  t  G  [0, . . .  ,y,-  —  1], 


A(P;+i)  =  cap(P;+Sl/«\P/+')>  £ 

«€P'+‘ 

Lemma  5.6.  For  all  i  and  all  t  G  [0, . . .  ,y,-  —  1],  5'(P-''“^)  <  ^A(P.''“^). 
Plugging  Lemma  5.6  in  5.3.37,  we  get 


I  I 


< 


£  £  A(p/)-25-^(P/)) 

i=l,2,...t=\,2,...,yi 

LOSS 

ao{2X{n)  +  ^) 


Henee 


E  E  A(p/) 


LOSS 

ao{2X{n)  +  ^)(1  -  8/ai) 


(5.3.40) 


Fix  G-  =  {Vf  ,El)  for  some  t  G  [1, . . .  ,y,].  We  now  ealeulate  the  amount  of  flows 
of  /  that  we  lose  from  '^i=\,2,...  7(^0  by  shutting  off  P/'*'^  in  G,-.  The  flow  that  we 
lose  falls  into  one  of  the  four  types: 


(1)  its  path  are  entirely  eontained  in  the  subgraph  of  G,  indueed  by  nodes  in 

’ 

(2)  its  path  eontains  edges  eounted  in  A(P/“'“^)  but  not  those  in  5'(P/‘''^); 


82 


Routing,  Disjoint  Paths,  and  Ciassification 


(3)  its  path  contains  edges  counted  in  but  with  at  least  one  endpoint  in 

p?+l  . 

^  i  ’ 

(4)  those  flow  with  both  endpoints  u'v'  E  V/~^\  such  that  its  path  intersects  edges 
counted  in  5^  for  at  least  twice. 

Flow  of  type  1  contributes  to  the  sum  7(^5^^  twice.  Flow  of  type  3 

contribute  its  flow  value  to  T,^^pi+iy{u,G\)  once  and  to  the  usage  of  = 

at  least  once.  Flow  of  type  4  contribute  its  flow  amount  at  least 
twice  to  the  usage  of  cap(F’/‘''^,F/‘''^).  Flow  of  type  2  has  been  counted  before 
when  P-  were  mute  for  some  j  <  t  from  G,-.  Note  that  those  flow  that  crosses 
either  has  been  counted  in  L„gp'+i  ^1  l^^st  once  or  it  goes 

through  the  cut  {Pj^^  in  &■  at  least  twice. 

Hence  the  total  amount  of  flow  of  /  that  we  lose  from  y(G,),  that  has  not  been 
counted  in  earlier  stages  than  t,  by  muting  the  induced  subgraph  of  and  its 
adjacent  edges  in  G-: 

^  Y(n,G,)+cap(P;+i,F/+i))  =  ^  I  Y(n,G,-)  +  ^5'(P;+i) 

^  M€P'+'  ^  «€P'+‘ 

<  A{/'+')X{n)  +  l5'(p;+') 

<  A{/'+')(X(n)+2/<i,), 

where  the  last  two  inequalities  are  due  to  Lemma  5.5  and  5.6. 

Summing  over  all  P/,  Vi,  V/,  given  that  ai  >  8,  the  total  flow  lost  in  sparsest-cut 
processing  stage  is 

fl0w40SS2  =  £  £  A{Pj)X{n)  +  h‘-^{Pj) 

<  X  I  A(P/)(Mn)  +  2/ai) 

!=1,2,...  t=l,...,yi 

{X{n)  +2/ai)LOSS 

“  2ao(X(n)  +  1/4)(1  -  8/ai) 

^  LOSS 

“  2ao(l  — 8/ai) 
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□ 

Proof  of  Lemma  5.5:  Given  that  <  LvGy'+‘ 

I»eP'+‘  +  Lvevf'  ^\)  =  'i^uevl  G'),  we  have 

X  tuA)<\"Ll{uA)-  (5.3.41) 

mgp'+'  ^  uev‘ 

Next  let  us  define  02  as  the  additional  flow  of  /  for  node  u  in  G?  as  eompared 
to  that  in  subgraph  G-, 


£  Y(n,G°)=  £  y{u,G\)+a2.  (5.3.42) 

ueP‘+' 


Since  each  unit  of  flow  of  02  uses  at  least  one  unit  capacity  from  edges  that 
connect  two  set  of  vertices  and  \  in  G?,  we  have 

02  <  A{PA)-5fP‘A)-  (5.3.43) 


In  addition,  we  know  that 


I  y{uA)  >  I  yiuA)  >  I  y{uA)+a2. 

uexf  uev'  uev! 


Thus  we  have 


I  y{uA) 

wGP'+‘ 


=  ^  y{u,G\)+a2 

uePf' 

I„ey.'Y(«,G9+a2  ^  a2 
-  2 

<  i:(T(«,G?)/2+f 

uev‘ 

<  E  fT(«.G?)/2  +  f . 

uex°  ^ 


(5.3.44) 


(5.3.45) 

(5.3.46) 

(5.3.47) 

(5.3.48) 
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A(p;+i)  >  cap(p;+i,v°\^/+') 

~  2X{n) 

~  2X{n) 

Otherwise,  L„eP'+i  lW-,G^i)  >  \  have 

£  Y(«,G°)>  £  Y(«,G?)/2-a2/2  (5.3.49) 

wGV,'’\f(+‘  uevf 

because  of  (5.3.47)  and 

£  Y(«,G°)+  £  y(«,G?)=  £  y(«,G°).  (5.3.50) 

«gp'+‘  «gv,'’\p'+‘  ugv;'* 

Therefore,  we  have 

A(p;+i)  =  cap(p;+i,t/°\p;+i) 

>  i7i,((T°\Pr')nx°) 

^Hey®\p'+' 

2X{n) 

^  (I„GvoY(«,Gf)/2-a2/2 

“  2X(n) 

I«GP'+‘ “^2 

“  2X{n) 

I„gp'+iY(«,G5) 

2)i(n)  ’ 


where  the  last  three  (in)equalities  are  due  to  (5.3.49),  (5.3.47),  and  (5.3.42),  and  in 
this  order.  □ 
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Next,  let  us  bound  the  size  of 

Proof  of  Lemma  5.6:  Given  that  <  LvGt>'+‘ 2y{&j)  = 

I»GP'+‘  +IvGVi'+'  Y(v,G|),  we  have  y(v,G')  <  2y(G'). 

By  terminating  eondition  2(b)  in  Figure  5.3.3,  we  have 

5f(pr+i)  <  dem'(F/+St//\F/+^)p/i 

^hgp'+'  Y(^:^;)  ■  I^vgv;'+'  Y(f^;) 
aiX{n)-y{G\) 

~  aiX{n) 

^  4_ 

~  rji  I  2X{n) 

where  dem^(P.‘''\F/  is  defined  in  (5.3.29).  □ 
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6  Finding  Disjoint  Paths  in 
Decomposed  Graphs 

6.1  Outline  of  Routing  in  a  Decomposed  Subgraph  G 

We  assume  that  we  have  the  7t-cut-linked  subgraphs  given  by  Theorem  4.2.  We 
will  treat  eaeh  subgraph  and  its  indueed  subproblem  (G,  T)  independently.  We  use 
7t(G)  to  denote  n{V (G))  in  the  following  seetions.  Let  X  be  the  set  of  terminals  of 
T  that  is  assigned  with  a  positive  weight  by  function  7C  in  instance  G.  We  further 
assume  that  S(G)  =  f2(log^  n).  If  not,  we  just  route  an  arbitrary  pair  of  terminals  in 
T\  otherwise,  we  use  PROCEDURE  EMBEDANDR0UTE(G,r,  7t)  in  Figure  6.1.1  to 
route.  We  first  specify  a  few  more  parameters  and  conditions  related  to  (G,  T);  We 
then  state  Theorem  6.1,  which  we  prove  through  the  rest  of  the  paper.  Combining 
Theorem  6.1  and  Theorem  4.2  proves  Theorem  6.2. 

6.1.1  Parameters  and  Conditions  Regarding  Subproblem  (G,r) 

-  sampling  probability  p  =  12(lnn) /s^K  =  1  /(colog^n  +  1) 

-  number  of  split  subgraphs  Z  =  \l p  =  (olog^n  +  1 

-  W  —  (colog^n  +  1)/(1  — s),  for  some  s  <  1; 

-  r  >  max{l,  (7t(G)  —  {W  —  1))/(2W  —  1)},  such  that  V/  C  [1, . .  .,r],2W  —  1  > 

=  UveXi  >  W  and  Ti{X)  >  7t(G)  —  (W  —  1):  i.e.,  at  most  W  —  \  unit 
of  weight  is  not  counted  in  X. 
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0.  Given  graph  G  with  min-cut  G(log^  n)  and  a  weight  funetion  n  :  V (G)  —>■  K+ 

1.  {G',...,G^}  =  Split(G,Z,7i) 

2.  {X,C}  =  Clustering(G^,7c),  where X  =  and  C  =  {Ci,...,Cr} 

3.  Given  a  set  of  superterminals  X  of  size  r 

4.  Let  X  map  to  vertex  set  V (H)  of  Expander  H 

5.  For  f  =  1  to  colog^n 

6.  (5,5  =  X\5)  =  KRV-FlNDCUT(X,{Mi  :A:  <t})  s.  t.  \S\  =  \S\  ^rjl 
1.  Matehing  M,  =  FindMatch(5',  5,  G')  s.t.  M,  is  routable  in  G' 

8.  Combine  Mi , . . .  to  form  the  edge  set  F  on  vertiees  V (H) 

9.  EXPANDERROUTE(//,r,X) 

10.  End 


Figure  6.1.1.  Procedure  EMBEDANDR0UTE(G,r,  it) 

6.1 .2  Two  Theorems  to  Prove 

Theorem  6.1.  Given  an  induced  instance  {G,T)  with  min-cut  ofG  being  fl(log^  n) 
and  a  weight  function  K  :  V  (G)  -F  K"*"  such  that  X  is  K-cut-linked  in  G  and  7t(G)  = 
G(log^n),  EmbedAndRoute  routes  at  max{l,G(7t(G)/log^n)}  pairs  of  T 
in  G  edge  disjointly. 

Theorem  6.2.  Given  an  EDP  instance  (^,T),  where  (f  has  a  min-cut  Q.{X{n)K), 
we  can  route  Q.{OP7*{^,F)/f)  terminal  pairs  edge  disjointly  in  where  the 
approximation  factor  f  is  0{X{n)^{Q)W  \og^  n). 

6.2  Obtaining  Z  Split  Subgraphs  of  G 

In  this  seetion,  we  analyze  a  proeedure  that  splits  a  graph  G,  with  min-eut  K  = 
G(log^n),  into  Z  subgraphs  b  extending  a  uniform  sampling  seheme  from  Karger 
[1994].  We  thus  obtain  a  set  of  eut-linked  instanees  as  in  Lemma  6.1,  whieh  imme¬ 
diately  follows  from  Theorem  6.3. 

Procedure  Split(G,Z,7t):  Given  a  graph  G  =  {V^E)  with  min-eut  K  =  G(log^n), 
a  weight  funetion  it :  E(G)  — >■  M'*',  a  set  of  terminals  A  in  G  sueh  that  (G,A)  is  a 
it-eut-linked  instanee,  and  probability  p  =  1/Z. 
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Output:  A  set  of  randomized  split  subgraphs  of  G. 

Each  split  subgraph  G^ yj  =  1, . . .  ,Z  inherits  the  same  set  of  vertices  of  G;  Edges 
of  G  are  placed  independently  and  uniformly  at  random  into  the  Z  subgraphs;  each 
e  =  (m,v)  €  E  is  placed  between  the  same  endpoints  m,v  in  the  chosen  subgraph. 
We  retain  the  same  weight  function  %  for  all  nodes  in  V  in  each  split  subgraph 

&yj. 

Lemma  6.1.  With  high  probability,  X  is  ^  ^  — cut-linked  in  G^  yj,  for  some  s  <  1. 

Proof.  Since  X  is  S-cut-linked  in  G  hence  |5(5')|  >  Ti{S  fl  A),  VS  such  that  7l(S  fl 
X)  <  ^(X) /2  in  G.  Eet  5j(S)  denote  the  size  of  cut  (S,  E  \S)  in  Gf  With  probability 
1  —  0(log^n/n^),  we  have  |6y(S)  |  >  (1  —  s)p|5(S)|  >  (1  —  s)p7t(Sn A)),  for  all 
S  such  that  7i(S  flA)  <  %{X)/2  and  all  j  as  shown  in  Theorem  6.3.  Hence  A  is 
(1  —  s)7l/Z-cut-linked  in  Gf'ij.  □ 

Theorem  6.3  says  that  all  cuts  can  be  preserved  in  all  split  graphs  G  \ ,  G^  of 
G  we  thus  obtain.  Recall  for  S  G  E,  |5g(S)|  denote  the  size  of  (S,E  \  S)  in  G.  Eor 
the  same  cut  {Sy\S),  we  have  E[|5gj(S)|]  =  p|5g(S)|  in  GfMj,  where  p  is  the 
probability  that  an  edge  e  €  E  is  placed  in  G^yj. 

Theorem  6.3.  Let  G  =  (E,  E)  be  any  graph  with  unit-weight  edges  and  min  cut  K. 
Let  s  =  y/'i{d  +  2)(lnn)/pK.  Ifz  <  1,  then  with  probability  1  —  0(log^  n/n^),  every 
cut  {Sy\S)  in  every  subgraph  G^ ^G^ ^ .  ,G^  of  G  has  value  between  (1  —  e)  and 
(1  +  £)  times  its  expected  value  p  |5g(S)|. 

We  give  an  overview  of  our  proof  by  introducing  a  definition  by  Karger  [1994], 
regarding  a  uniform  random  sampling  scheme  on  an  unweighted  graph  G  =  (E,E); 
Lemma  6.2  immediately  follows  from  this  definition.  We  then  state  Karger’s  the¬ 
orem  regarding  preserving  all  cuts  of  G  in  a  sampled  subgraph,  under  a  certain 
min-cut  condition.  Einally,  we  show  the  details  of  our  proof  to  Theorem  6.3.  Eor 
the  sake  of  completeness,  we  also  give  Karger’s  proof  to  Theorem  6.4. 

Definition  6.1.  (Karger  [1999])  A  p-skeleton  of  G  is  a  random  subgraph  G{p) 
constructed  on  the  same  vertices  of  G  by  placing  each  edge  e  E  E  in  G{p)  inde¬ 
pendently  with  probability  p. 

Lemma  6.2.  Every  randomized  subgraph  GfMj,  is  a  p-skeleton  ofG. 
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Proof.  Recall  the  construction  of  a  random  subgraph  j,  of  G:  on  the  same  set 
of  vertices  as  G,  each  edge  e  E  E  of  the  original  graph  G  is  placed  in  G^  indepen¬ 
dently  with  probability  p.  Hence,  Gfyj,  is  a  p-skeleton  of  G  by  Definitionh.l.  □ 

Theorem  6.4.  (Karger  [1999])  Let  G  be  a  graph  with  unit-weight  edges  and  min- 
cut  K.  Let  p  =  3((i  -f  2)(lnn)/s^K.  With  probability  1  —  0{\/n‘^),  every  cut  in  a 
p-skeleton  of  G  has  value  between  (1  —  e)  and  (1  -fs)  times  its  expected  value. 

Proof  of  Theorem  6.3:  Define  an  indicator  variable  X/,  Vy,  Ve  €  E,  such  thatX/  =  1 
when  e  is  placed  in  Gf  and  0  otherwise;  hence  Xj  is  a  Bernoulli  random  variable 
with  success  probability  p,  Vy,  Ve.  Note  that  random  variables  Xf\/j=  1, . . . ,  1/p, 
are  not  independent;  in  fact,  Lyffi  =  1  for  all  e. 

Consider  a  cut  (5, 5)  of  size  c  in  G.  Let  be  the  indicator  vari¬ 

ables  that  signal  whether  edges  ei,e25---:^c  of  cut  (5, 5)  appear  in  GL  Define 
Xj{S,S)  =  Yfy^\Xj  as  the  size  of  the  cut  in  a  random  subgraph  G^  of  G.  Given 
that  are  i.i.d.  random  variables  whose  common  distribution  is  the 

Bernoulli  distribution  with  parameter  p,  we  can  apply  Chernoff  bound  to  obtain 
the  following  lemma. 

Lemma  6.3.  Consider  a  cut  {S,V  \S)  of  size  c  in  unweighted  graph  G  —  {V,E). 
Let  Xj{S,S)  be  the  size  of  the  corresponding  cut  in  a  randomized  split  graph  GL 
Then  ys,  V y,  we  have 

Pr[|Xy(5,L\5)  -pc|  >  epc]  <  (6.2.1) 

Lemma  6.4.  Chernoff  [1952])  Let  X  be  a  sum  of  independent  Bernoulli  random 
variables  with  success  probability  pi,...,pm  and  expected  value  p  =  ^p,-.  Then 
for  £  <  1 


Pr[|X-p|  >£p] 

Let  r  =  2"  —  2  be  the  number  of  cuts  in  graph  G,  and  hence  G  \  . . . ,  G^,  and  let 
Cl , . . . ,  be  the  expected  values  of  the  r  cuts  in  a  p-skeleton  listed  in  nondecreasing 
order  so  that  pK  =  ci  <  C2, . . . ,  <  c^.  Given  a  split  graph  Gfyj,  let  El,yj,yk  be 
the  event  that  the  value  of  a  cut  Xj{S,V /S)  in  G^  diverges  from  its  expectation  Ck 
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by  more  than  zc^-  First  we  have  Pr  by  Lemma  6.3.  We  then  apply 

a  union  bound  to  sum  up  (  6.2.1)  for  all  r  euts  in  G^,  Vj. 

Given  that  every  random  split  subgraph  G^yj,  is  a  p-skeleton  of  G  by 
Lemma  6.2,  we  apply  Karger’s  statement  as  in  the  form  below,  to  all  subgraphs  G^ 
with  the  following  parameters:  p  =  12(lnn) /s^K  and  K  =  12(lnn)((olog^n+  1) /s^ 
for  a  given  s;  following  Karger’s  proof  to  Theorem  6.4,  we  have: 

Lemma  6.5.  (Karger  [1999])  ZLi  Pr  ['E/I  <  G(l/n^). 

We  ean  then  use  a  union  bound  to  sum  up  probabilities  of  bad  events  aeross  all 
split  subgraphs  G^ , . . . ,  G^  of  G,  whieh  yields  following: 


IlPrE/  <  0{log^n/n‘‘) 
j=ik=i 


(6.2.2) 


Note  that  ,  Vj  =  1 , . . . ,  Z,  are  not  independent,  sinee  the  indieator  random  vari¬ 
ables  that  eontribute  to  value  of  Xj(S,V /S)  are  not  at  all  independent  aeross  all 
subgraphs.  However,  we  only  use  a  union  bound  that  does  not  assume  anything 
about  dependeney  among  events.  □ 

Proof  of  Theorem  6.4:  (Karger  [1999])  We  give  a  sketeh  of  Karger’s  proof  as 
shown  in  Karger  [1999]  here  for  the  sake  of  eompleteness.  To  prove  Theorem  6.4, 
Karger  uses  a  union  bound  to  show  that  the  sum  of  probabilities  of  all  bad  events  in 
a  p-skeleton  of  G  is  0{l/n‘^),  where  a  bad  event  refers  to  some  eut  in  a  p-skeleton 
of  G  diverges  from  its  expeeted  value  k  by  more  than  zk.  The  proof  of  this  elaim 
follows  by  using  two  lemmas: 

Lemma  6.6.  (Karger  [1999])  In  an  undirected  graph,  the  number  of  (X-minimum 
cuts  is  less  than 


The  “expeeted  value”  graph  G  of  G^yj,  is  a  weighted  graph  with  all  vertiees 
and  edges  of  the  original  unweighted  graph  G  —  (Vy),  and  with  edge  weight  p 
assigned  to  edge  eye  €  E.  Note  that  the  minimum  eut  of  G  is  pK,  where  K  is  the 
minimum  eut  of  G.  Lemma  6.6  applied  to  G,  the  “expeeted  value”  graph  of  a  p- 
skeleton  of  G,  states  that  the  number  of  euts  within  a  faetor  of  the  minimum  pK 
inereases  exponentially  with  a.  On  the  other  hand,  the  Chernoff  bound  says  that 


92 


Routing,  Disjoint  Paths,  and  Ciassification 


one  such  cut  diverges  too  far  from  its  expected  value  decreases  exponentially  with 
a  as  shown  in  (  6.2.1).  Combining  these  two  lemmas  and  balancing  the  exponential 
rates  proves  Theorem  6.4  using  a  union  bound.  □ 

6.3  Forming  Superterminals  that  are  Well-Linked 

The  procedure  in  this  section  constructs  superterminals  as  follows.  It  finds  con¬ 
nected  subgraphs  C  in  G^,  where  7i(C)  =  f2(log^n),  each  connecting  a  subset  of 
terminals.  Roughly,  the  idea  is  that  these  clustered  terminals  are  better  connected 
than  individual  terminals.  They  are  well  linked  in  the  sense  that  any  cut  that  splits 
off  K  superterminals  as  one  entity  contains  at  least  K  edges  in  yj  This  allows 
us  to  compute  congestion-free  maximum  flows  in  Section  6.4.1. 

Given  split  subgraphs  . . . ,  of  G,  each  with  the  same  weight  function  It  on 
its  vertex  set  V{G^)  =  Vyj,  that  we  obtain  through  PROCEDURE  Split(G,Z,  it), 
we  aim  to  find  a  set  =  {Xi , . . .  ,Xr}  of  node-disjoint  “superterminals”,  where  each 
superterminal  X,  E  X  consists  of  a  subset  of  terminals  in  X  and  each  Xt  gathers 
a  weight  between  W  and  2W  —  1.  In  addition,  we  want  to  find  an  edge-disjoint 
set  of  clusters  C  —  {Ci, . . .  ,Cr},  where  C,-  =  (Vj-, £■,•),  such  that  Xt  C  Vj  and  C,-  is  a 
connected  component,  and  hence  all  nodes  inX,  are  connected  through  Ei.  W.l.o.g., 
we  pick  G^  for  forming  such  clusters  C,  ,  V/;  note  that  G^  is  a  connected  graph  with 
a  min-cut  of  G(logn),H'/ip,  by  Theorem  6.3. 

Procedure  Clustering(G^,7t):  Given  a  split  subgraph  G^  and  a  weight  function 
7C :  V{G^)  K+  and  ^V{G^))  =  ^G)  >  W. 

Output:  X  =  {Xi , . . .  ,Xr}  and  C  =  {Ci , . . . ,  G-}  as  specified  in  Lemma  6.7. 

We  group  subsets  of  vertices  of  V  in  an  edge-disjoint  manner,  following  a  proce¬ 
dure  from  Chekuri  et  al.  [2004a],  by  choosing  an  arbitrary  rooted  spanning  tree  of 
G^  and  greedily  partitioning  the  tree  into  a  set  C  of  edge-disjoint  subgraphs  of  G^. 

Lemma  6.7.  (CKS2004  Chekuri  et  al.  [2004a])  Let  G^  be  a  connected  graph 
with  a  weight  function  n  :  V{G^)  -X  [0,1T]  such  that  7t(L(G^))  >  W.  We  can  find 
r  >  max{l,  (7t(G)  —  {W  —  1))/{2W  —  1)}  edge-disjoint  connected  subgraphs,  C\  — 
. . .  ,Cr  =  {Vr,Er),  such  that  there  exist  vertex-disjoint  subsets 
and  for  each  i:  (a)  Xi  C  Vj-  and  (b)  2W  —  1  >  LveX;  ^ 
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Result.  To  get  an  intuition  of  the  purpose  of  forming  such  clusters,  consider  a  cut 
{U,V\U)  in  a  split  subgraph  , V7.  Let  U  be  a  subset  of  L ( G)  such  that  n{U)  = 
llxeunx  ^(•^)  ^  /2-  Let  K  be  the  number  of  superterminals  that  are  contained 

in  U.  We  have  the  following  lemma,  which  captures  the  notion  of  superterminals 
being  “well-linked”,  with  a  hint  of  Definition  4.3. 

Lemma  6.8.  V  split  subgraphs  G  V  •  •  5  where  Z  —  colog^  n+\,  and  VG  C  V (G) 
sJ.  7t(G)  <  n(X)/2,  |5g;(G)|  >  K,  where  K  =  \{Xi  eX:XiC  G}|. 

Proof.  With  high  probability,  X  is  -cut-linked  in  G\  . . . ,  G^,  as  shown  in 

Lemma  6.1.  Recall  that  in  our  clustering  scheme,  total  weight  of  all  terminals  in 
one  cluster  is  at  least  W  =  then  y 


|5G.(f/)l  > 
> 

> 

> 


y-,  (1  £)7t(x) 

^^uicalog^n  +  l) 

^  ^  I 

( (0 log  n+\)  f.x-cuxeXi 
{l-e)KW 
(colog^n+  1) 

K 


□ 


6.4  Construct  and  Embed  an  Expander  in  G 

In  this  section,  we  use  the  superterminals  from  the  previous  section  as  nodes  in  an 
expander  H  that  we  embed  in  G.  The  edges  of  H  are  defined  using  a  fechnique 
in  Khandekar  ef  al.  [2006]  fhaf  builds  an  expander  using  O(log^n)  mafchings.  We 
embed  fhis  expander  in  G  by  roufing  each  mafching  in  one  of  fhe  splif  graphs 
using  a  maximum  flow  compufafion.  This  allows  us  fo  embed  H  info  G  wifh  no 
congestion.  The  following  procedure  resfafes  fhis  oufline.  Theorem  6.5  is  a  main 
technical  confribufion  of  fhis  paper. 


Procedure  EmbedExpander(G V . . , G“^°®^",2C): 
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0.  Given  a  set  of  points  V  (H)  of  size  k 

1.  for  f  =  1  to  (olog^n 

2.  (S,S  =  V(N)\S)  =  KRV-FindCut(V (N), {Mr.  k  <  tj)  s.t.  \S\  =  \s\=k/2 

3.  Mf  =  FindMatch(5',5)  s.t.  Mt  is  a  matching  between  S  and  S 

4.  Combine  Mi,...  to  form  the  edge  set  F  on  vertices  V  (H) 

5.  End 


Figure  6.4.2.  KRV-Procedure  CONSTRUCTING  AN  a-FxPANDER  FI. 

Output:  An  expander  H  =  {V',F)  routable  in  G  s.t.  |E'|  =  r  and  V/  G  V',  n{i)  = 
7c(A,)  and  Ti{H)  =  F  consists  of  Mi, . . . 

We  use  Step  (3)  to  (8)  of  Procedure  EmbedAndRoute  in  Figure  6.1.1,  where 
we  substitute  PROCEDURE  FindMatch  with  Figure  6.4.3  while  relying  on  an 
existing  PROCEDURE  KRV-FindCut  Khandekar  et  al.  [2006].  At  each  round  t, 
we  use  KRV-FindCut  to  generate  an  equal-sized  partition  {S,X\S  =  S)','we  then 
find  a  matching  Mt  between  S  and  S  by  computing  a  single-commodity  max-flow 
using  FindMatch(5',S,G')  in  G*,  that  we  add  to  F  as  edges. 

Theorem  6.5,  (a)  EmbedExpander  constructs  a  I / A-expander  H  =  {V',F); 
(b)  in  addition,  H  is  embedded  into  G  as  follows.  Each  node  i  of  H  corresponds 
to  a  superterminal  Xi  in  X  in  G  such  that  all  superterminals  are  mutually  node 
disjoint  and  each  superterminal  is  connected  by  a  spanning  tree,  T,,  in  G.  Each 
edge  {i,j)  in  H  corresponds  to  a  path.  Pi j  from  a  node  in  Xj  to  a  node  in  Xj.  All 
paths  Pij  and  trees  7]-  are  mutually  edge  disjoint  in  G. 

Proof.  The  expander  property  (a)  follows  from  a  result  of  Khandekar,  Rao  and 
Vazirani  Khandekar  et  al.  [2006];  they  show  the  procedure  in  Figure  6.4.2  produces 
an  expander  H. 

Theorem  6.6.  (KRV06  Khandekar  et  al.  [2006])  Given  a  set  of  nodes  V{H) 
of  size  k,  3  a  KRV-FindCut  procedure  s.t.  given  any  FindMatch  procedure, 
the  KRV-PROCEDURE  as  in  Figure  6.4.2.  produces  an  a-expander  graph  H,  for 
a  >  1  /4. 

Each  edge  e  =  {i,j)  in  the  matching  Mt  maps  to  an  integral  flow  path  that 
connects  A,  and  Xj  in  G';  all  such  flow  paths  can  be  simultaneously  routed  in 
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G'  edge  disjointly  due  to  the  max-flow  computation  as  we  show  in  Lemma  6.9. 
Since  each  matching  M‘  is  on  a  unique  split  subgraph  G\  the  entire  set  of  edges 
in  Ml, . . .  ,M^^jpg2^,  that  comprise  the  edge  set  F  of  FI,  correspond  to  edge  disjoint 
paths  in  G',.  . .  ,G^“\  where  Z  =  (Olog^n-|- 1.  Finally,  all  spanning  trees  are 

constructed  using  disjoint  set  of  edges  in  G^  as  in  Lemma  6.7.  □ 

6.4.1  Finding  a  Matching  through  a  Max-flow  Construction 


0.  Given  an  equal  partition  (5,  S)  of  X,  we  form  a  flow  graph  G'  from  G' 
by  adding  auxiliary  nodes  and  directed  unit-capacity  edges: 

1.  Add  a  special  source  and  sink  nodes  to; 

2.  Add  nodes  , . . . , 5^/2  an  edge  from  sq  io  Ski^k  —  1 , . . . , r/2; 

3.  Add  nodes  ti , . . . , from  each  tk,yk  =  1 , . . . , r/2,  add  an  edge  to  to 

4.  From  each  Sk-i^k,  add  an  edge  to  each  terminal  x  E  s.t.  E  S 

5.  To  each  node  tk,  add  an  edge  from  each  terminal  x  E  s.t.  E  S 

6.  Route  a  max-flow  from  so  to  to 

7.  Decompose  the  flow  to  obtain  a  matching  between  S  and  S 

8.  End 


Figure  6.4.3.  Procedure  FindMatch(5',5',G'^) 

In  this  section,  we  show  that  given  an  arbitrary  equal  partition  {S,S)  of  the 
set  X  —  {Ai, . . .  ,A^},  that  we  obtain  through  PROCEDURE  CLUSTERlNG(G^,7t), 
we  can  use  the  following  procedure  to  route  a  max-flow  of  size  r/2,  such  that  the 
integral  flow  paths  that  we  obtain  through  flow  decomposition  induce  a  perfect 
matching  between  S  and  S.  Let  S  =  {A,j , . . . , S  =  {Xj^ , . . . 

Lemma  6.9.  In  each  sampled  graph  G‘,  FindMatch  produces  a  perfect  matching 
Mt  between  an  equal  partition  {S,  S)  ofX  such  that  for  each  edge  in  e  —  (/,  j)  E  M,, 
there  is  an  integral  unit-flow  path  Pij  from  a  terminal  in  A,-  E  S  to  a  terminal  in 
Xj  E  S.  All  paths  Pij,s.t.{i,j)  E  Mt  are  edge  disjoint  in  G\ 

We  first  prove  the  following  lemma. 


Lemma  6.10.  Every  50  —  G  cut  has  size  at  least  r/2  in  the  flow  graph  G'. 
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Proof.  Let  {U,U)  be  a  eut  in  the  flow  graph  that  separates  from  fo;  w.l.o.g.,  let 
U  be  subset  sueh  that  Tt{U  flX)  <  %{X)/2,  and  let  so  (otherwise,  we  ean  just 
rename  all  the  auxiliary  nodes  and  the  two  subsets  S  and  S). 

Consider  any  superterminal  X  ^  X  that  we  obtained  through  lemma  6.7;  if  X 
is  eontained  either  in  U  or  0,  we  eall  sueh  a  superterminal  X  uneut;  otherwise,  we 
say  X  is  eut  by  {U,  U). 

(1)  Let  —  |{X  £5  :XnU,XnU  7^0}|  denote  the  number  of  superterminals 
in  S  that  is  eut  by  {U,  U). 

(2)  Let  =  |{Xe5':XC17}|  be  the  number  of  superterminals  in  S  that  is 
eontained  in  U ; 

(3)  Let  =  |{X€5':XCf/}|  denote  the  number  of  superterminals  in  S  that 

is  eontained  in  U ;  henee  +  K^  —  rjl,  where  r  —  \X\. 

(4)  Let  |{2f  E  S  :  X  nU,X  nU  denote  the  number  of  superterminals 
in  S  that  is  eut. 

(5)  Let  =  |{Xe.S:XC17}|  denote  the  number  of  superterminals  in  S  that 
is  eontained  in  U. 

Given  that  G  is  S-eut-linked,  we  know  that  the  sampled  graph  is  (1  — 
s)7l/(colog^n  +  l)-eut-linked  whp  by  Lemma  6.1.  Reeall  that  in  our  elustering 
seheme,  total  weight  of  all  terminals  in  one  superterminal  is  at  least  W  = 

Note  that  there  is  at  least  one  direeted  auxiliary  edge  erossing  the  eut  for  all  supert¬ 
erminals  exeept  those  in  S  that  is  eontained  in  U  or  those  in  S  that  is  eontained  in 
U. 

Thus  we  know 

\5GiU)\  > 

> 

> 

> 

>  rjl. 


(l-e)LxG[/^W 


colog  n+  1 
{\-z){Kl  +  Kl)W 
C0log^n+  1 
Kl  +  K!f+Kl 


+  KL  +  Ki,  +  K  +  Kl 
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Hence  we  have  shown  that  the  size  of  every  cut  (U^U)  in  the  flow  graph  G'  has 
size  at  least  r/2.  □ 

Proof  of  Lemma  6.9:  By  Lemma  6.10,  and  the  fact  that  there  3  a  5o  —  lo  cut  of  size 
r/2,  (e.g.,  ({5o},1^(G0  \  {‘^o}))  we  know  the  5o  —  lo  min-cut  is  r/2.  Hence  by  the 
max-flow  min-cut  theorem,  we  know  that  there  3  a  max-flow  of  size  r/2  from 
to  to-  We  next  decompose  the  max-flow  into  r/2  integer  flow  paths,  which  induce 
a  perfect  matching  M,  between  S  and  S  as  follows.  Consider  an  integral  flow  path 
Pf^yk  =  1, . . .  ^rjl.  Let  directed  path  Pk  start  with  so  and  go  through  Sk^x  €  E  S 
for  some  x;  and  let  P^  end  with  y  E  Xj^  E  5, 4/, to  for  some  k'  E  [1, •  - •  ^r/2]  and 
some  terminal  y.  No  other  path  in  the  max-flow  can  go  through  the  same  pair 
of  superterminals  X4,  Xj^  due  to  the  capacity  constraints  on  edges  {so^Sk)  and 
(4/,  to).  Hence  M,  =  {(4,  C  [l^-  •  where  A  E  [1, . . .  ,r/2]}  is  a  perfect 

matching  between  S  and  S.  □ 

6.5  Routing  on  an  Expander  H  Node  Disjointly 

In  this  section,  we  show  that  the  following  greedy  algorithm  routes  Q.{K/\og^n) 
pairs  of  terminals,  where  K  =\V (H)  \  =  Q(Ti(G) /W),  in  H. 

Procedure  ExpanderRoute(//,r,X):  Given  an  uncapacitated  expander  H  with 
at  least  5121og^n  nodes,  with  node  degree  (olog^n.  While  there  is  a  pair  (s,t)  in 
r  C  T  whose  path  length  is  less  than  D  in  H  =  (V,E),  where  D  =  ajcolog^n  and 
as  =  32  is  a  constant;  Remove  both  nodes  and  edges  from  H,  along  a  path  through 
which  we  connect  a  pair  of  terminals  in  T. 

Since  we  take  away  both  nodes  and  edges  as  we  route  a  path  across  the  ex¬ 
pander  H  due  to  the  node  capacity  constraints  on  V{H),  routing  the  set  P  of  pairs 
via  integral  paths  on  H  induces  no  congestion  in  G  by  Theorem  6.5.  We  now  ar¬ 
gue  that  IPI  is  large  to  finish  our  proof.  Let  H'  be  the  remaining  graph  of  expander 
H  =  {V,E),  after  we  take  away  nodes  and  edges  along  the  paths  used  to  route  P. 
Note  that  all  remaining  pairs  T'  CT  in  H'  must  have  distance  at  least  D.  This  is 
the  main  condition  that  allows  us  to  prove  the  following  theorem. 
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Theorem  6.7.  The  procedure  above  routes  Q.{K / log^  n)  pairs,  node  disjointly,  in 
degree- n)  expander  H  =  {V^E)  with  K  >  5121og^?i  nodes. 

We  first  prove  the  following  lemma  regarding  a  multieut  LmH' . 

Lemma  6.11.  3  a  multicut  of  size  at  most  K /2a^  in  the  remaining  graph  H'  ofH. 

Proof.  Let  us  first  state  the  following  lemma  whieh  follows  from  arguments  of 
Garg  et  al.  [1996]. 

Lemma  6.12.  If  all  remaining  terminal  pairs  in  T'  CT  have  distances  at  least  D 
in  H',  then  there  exists  a  multicut  L  in  H'  =  {V'  ,E')  of  size  \E'\\ognlD  in  El'  that 
separates  every  source  and  sink  pair  sdi  G  T'. 

Applying  Lemma  6.12  to  H' ,  we  have  that  there  exists  a  multieut  of  size  at 
most  Ktnlog^ n/lD  —  K/2a^  given  that  \E'\  <  IL"!  =  /fcolog^n/2  in  the  remaining 
graph  H' .  □ 

We  prove  Theorem  6.7,  by  noting  that  eondition  1  of  Theorem  4.2  implies  that 
any  multieut  of  the  terminals  in  H'  ensures  that  no  pieee  in  H'  separated  by  L 
eontains  more  than  half  the  weight  of  all  terminals  in  H.  We  use  this  faet  to  show 
that  the  multieut  L  ean  be  rearranged  to  find  a  “weight-balaneed”  eut  in  H' ,  whieh 
eorresponds  to  a  node-balaneed  eut  in  H.  Any  node-balaneed  eut,  however,  in  H 
must  have  at  least  Q.{K)  edges.  Using  a  proper  ehoiee  of  a3,  we  foree  this  balaneed 
eut  to  eontain  at  most  half  as  many  edges  in  H'  as  in  H.  Thus,  we  show  Q.{K)  edges 
have  been  removed  when  routing  P.  Sinee  routing  eaeh  sueh  pair  removes  at  most 
Dcolog^n(0(log^n)  edges.  We  eonelude  iPj  must  be  Q.{Kl\og^n). 

Proof  of  Theorem  6.7:  Reeall  that  initially  %{H)  =  Ti{X)  >  n(G)  —  (W  —  1),  sinee 
at  most  IT  —  1  of  7t(G)  is  not  assigned  to  any  node  in  ff,  and  eaeh  node  in  ff  has 
weight  between  W  and  2W  —  1  as  in  shown  in  proof  of  Lemma  6.7.  Henee  the  total 
weight  taken  away  from  routing  P  terminal  pairs  of  distanee  at  most  D  is  at  most 
DP{2W-l). 

To  faeilitate  our  analysis,  we  first  alter  it  slightly  to  generate  a  new  funetion 


Procedure  Alter(7C,  it'):  For  a  pair  of  terminals  nv  G  T  sueh  that  u  takes  away  a 


6.5  Routing  on  an  Expander  H  Node  Disjointly 


99 


certain  amount  of  weight,  we  remove  the  same  amount  (as  specified  in  condition  1 
of  Theorem  4.2)  from  7t(v)  if  v  remains  in  H'  and  define  this  updated  weight  of  H' 
as  7t' (//')■  Thus  we  have  >  7t(G)  -{W-  1)-1DP{1W  -  1). 

It  is  easy  to  see  that  only  remaining  pairs  uv  E  T'  contribute  a  positive  weight 
to  according  to  their  flow  in  /  like  that  of  condition  1  in  Theorem  4.2;  hence 

each  connected  component  in  H' ,  separated  by  multicut  L,  has  a  weight  of  at  most 

Let  L  be  the  multicut  that  separates  all  remaining  terminals  pairs  T'  eT  m  H' . 
L  cuts  the  graph  H'  and  hence  group  nodes  in  V{H')  into  clusters,  such  that  weight 
of  each  cluster  according  to  7t'  is  less  than  half  of  the  total  remaining  weight 
of  H',  since  each  pair  of  terminals  that  contribute  the  same  amount  of  weight  to 
must  belong  to  different  multicut  clusters. 

We  then  use  L  to  find  a  weight-balanced  cut  {U\V'  \  U')  in  H'  such  that 
each  side  has  weight  at  least  jA,  where  >  7t(G)  —  (W  —  1)  —  2(2W  — 

1)D|/’|.  It  is  straightforward  to  verify  that  any  partition  {U^V{H)  \  U)  in  H,  such 
that  U'  C  U  and  {V'  \  U')  C  {V{H)  \  U),  is  node-balanced  in  H  as  shown  in 
Lemma  6.13  and  Lemma  6.14. 

We  build  a  (1/4, 3/4) -weight-balanced  partition  of  H'  in  the  following  way: 
start  two  empty  sides  A  and  B,  and  start  adding  the  connected  components  (after 
removing  the  multicut  L)  of  H'  to  the  smaller  side  repeatedly.  Each  component 
contains  at  most  %'{H')j2  due  to  condition  1  of  Theorem  4.2  and  PROCEDURE 
Alter;  in  the  end  neither  side  can  contain  more  than  /A  of  weight;  indeed, 

consider  the  step  where,  w.l.o.g,  side  A  were  put  over  3/4  of  by  adding 

a  component  d:  in  that  step,  d  could  not  have  been  added  to  A,  since  7t'(A)  > 
/a  >  t^{B)  before  d  were  added,  given  that  d  <  ^{H') /2. 

Lemma  6.13.  Let  {U',V{H')  \  U')  be  a  {I lA,3lA)-weight-balanced  cut  in  H'. 
Consider  any  cut  {U,V{H)  \  U)  in  H,  such  that  U'  C  U  and  {V{H')  \  U')  C 
{V  (H)  \  U)  before  we  route  any  of  the  P  paths: 

min(|G|,|l/\G|)  > 
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Proof.  Indeed,  if  U  is  the  smaller  side,  1 1/ 1  >  1 1/  ^  | ;  otherwise,  we  have  |  V  (//)  \  17 1  > 
\V{H')  \  1/'|.  For  both  U'  and  V{H')  \  U',  we  have 

\U'\  >  -1) 

>  Tt'{H')l4{2W  -1) 

{^G)-{W -l)-2DP{2W -1)) 

-  4(21F-1) 

sinee  eaeh  node  in  V {PI')  has  weight  at  most  2PN  —  1  despite  alterations  on  terminal 
weights  and  Tt'{H')  >  Tt{G)  —  (W  —  1)  —  2DP{2W  —  1).  Therefore 

□ 

Lemma  6.14.  |5//(17)|  >  ^{H)  (f  —DP/2),for  any  {U,V{H)  \  U)  as  defined  in 
Lemma  6.13. 

Proof.  By  the  edge  expansion  property  of  expander  H,  we  get  the  lower  bound  on 
the  size  of  the  eut  {U,V\U): 

\^h{U)\  > 

> 

> 

> 

sinee  K  <  7t(G) jW ,  given  that  every  eluster  must  have  weight  at  least  W  inH  and 

7c(G)/161T2  =  f2(iog3«)  >  1/8.  □ 


^(//)min(|f/|,|l/\f/|) 

^{H)  {^G)/%W  +  ^i{G)/l6W^-^-DP|2^ 


^{H)  {^-DPI2 


On  the  other  hand,  by  Lemma  6.12,  we  know  that  the  eurrent  size  of  the  bal- 
aneed  eut  in  H'  is  at  most  the  size  of  the  multieut  L  given  the  eonstruetion  of 
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(6.53) 

The  edge  loss  from  the  balaneed  eut  cap(17, V (H)  \U)mH  is  eaused  by  rout¬ 
ing  the  P  paths,  which  can  take  away  at  most  DPoilog^n  number  of  edges.  Thus 
we  have: 


\5HiU')\+DP(olog^n  > 

(X)K  ,  , 

- - hDPcolog^n  > 

2a3 

DP{(olog^n  +  ^{H)/2)  > 

P  > 


<^{H)[^-DP/2^ 


Kio 

2a3 


a3  log^n(colog^n  +  (|)(//)/2) 


By  taking  <^{Pl)  —  1/4,  —  32co,  we  have  D  —  32colog^n  and  P  > 

K /2048(O^  log^  n,  for  a  constant  co.  □ 
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Imagine  -  John  Lennon 


Imagine  there ’s  no  countries 
It  isn  ’t  hard  to  do 
Nothing  to  kill  or  die  for 
And  no  religion  too 
Imagine  all  the  people 
Living  life  in  peace... 


Imagine  no  possessions 
I  wonder  if  you  can 
No  need  for  greed  or  hunger 
A  brotherhood  of  man 
Imagine  all  the  people 
Sharing  all  the  world... 


7  Global  and  Local  Optima  Lemmas 


Be  careful  what  you  wish  for,  it  might  come  true.  -  J.K.Rowling 

7.1  The  Problem  Definition  and  Preliminaries 

In  the  next  three  ehapters,  we  study  the  following  problem:  Given  a  set  of  2N 
diploid  individuals  from  two  populations  Pi  and  Pj,  we  aim  to  elassify  individuals 
aeeording  to  their  populations  of  origin,  based  on  only  a  small  amount  of  their 
genotype  data.  Reeall  that  for  diploid  organisms  the  ehoromosomes  eome  in  pairs 
and  a  genotype  is  a  list  of  unordered  pairs  of  alleles  sueh  that  one  eomes  from  eaeh 
parent.  We  define  K  as  the  number  of  attributes  that  we  draw  from  eaeh  individual. 
We  aim  to  minimize  K  while  being  able  to  elassify  our  sample. 

We  use  eapital  letters  X,Y.,Z  to  denote  individuals,  eaeh  of  which  also  repre¬ 
sents  the  observed  genotype  data  corresponding  to  an  individual  across  a  set  of  K 
loci.  Since  each  SNP  has  two  variants  (alleles),  we  use  bit  1  and  bit  0  to  denote 
them.  We  use  their  corresponding  lower  case  letters  x\y\f  to  denote  their  bits  at 
locus  i.  We  use  X'  =  :  a  G  {1,2}}  to  denote  an  unordered  pair  of  bits  (alleles) 

at  locus  i  for  individual  X. 

In  Chapter  9,  we  consider  the  case  that  we  are  given  only  a  single  bit  from 
each  of  K  loci.  This  corresponds  to  the  classic  problem  of  learning  mixtures  of  two 
product  distributions  over  the  K  dimensional  Boolean  cube  {0, 1}^,  when  attribute 
values  across  different  dimensions  are  mutually  independent. 
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7.1 .1  The  Statistical  Model  and  Measure  of  Distance 

Given  the  population  of  origin  of  eaeh  individual,  the  genotypes  are  assumed  to 
be  generated  by  drawing  alleles  independently  from  the  appropriate  population 
frequeney  distribution.  We  use  =  Pr  [xj,  =  l]  and  =  Pr  [yjj  =  l] ,  V/  =  1 , . . . , 
to  denote  the  “sueeess”  probability  (frequeney  of  an  allele  mapping  to  bit  1)  at  loeus 
i  in  the  population  of  origin  of  individual  X  and  Y  respeetively. 

In  partieular,  assuming  the  population  of  origin  for  X  is  Pi  and  Y  eomes  from 
population  P2:  we  assume  that  x^,Va  G  {1,2},V/  are  independent  Bernoulli  ran¬ 
dom  variables  with  sueeess  probability  p‘^,  and  yj,,  G  {1,2},  V/  are  independent 
Bernoulli  random  variables  with  sueeess  probability  Py,  and  we  define 


Pr[xi  =  l]  =  p\=pi,  (7.1.1) 

Pr[xi  =  0]  =  ^'1  =  1 -pi,  (7.1.2) 

Pr[yi  =  l]  =  pi=pi,  (7.1.3) 

Pr[yi  =  0]  =  4  =  1-Pi-  (7-1.4) 

For  two  populations  Pi  and  P2,  we  eall  pi,p2  their  eenters,  where 

Pi  =  (pl,P?,---,pf),  (7-1-5) 

P2  =  (P2,P2,---,pf)-  (7.1.6) 


We  use  the  following  y  to  measure  the  average  distanee  between  Pi  and  P2  aeross 
their  K  loei  (dimensions). 

Definition 7.1.  y(Pi,P2)  = 

We  use  X  ~  Pa ,  Va  =  1 , 2  to  represent  that  X  is  a  random  node  from  population 
Pa-  In  the  following  definitions,  we  use  X  to  represent  the  sequenee  of  K  unordered 
pairs  of  bits  that  we  see  from  loeus  1  to  P'  on  node  X,  where  an  unordered  pair  of 
bits  refer  to  one  of  {00,01, 10, 11}.  Let  X^yk  represent  the  pair  of  bits  at  loeus  k 
in  X.  Reeall  that  an  indicator  variable  is  a  diserete  random  variable  that  takes  on 
only  the  value  of  0  or  1 . 
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Definition  7.2.  Given  X,  we  define  the  following  indicator  random  variables 
/fi  I,...,  K  such  that: 


/fi(X)  =  l  :  X*^=ll, 

4(^)  =  1  :  ^'  =  00, 

=  l  :  X’^  =  0\,orl0, 

where  X^  denotes  the  unordered  pair  of  bits  observed  at  locus  k  in  X. 
Definition  7.3.  Given  two  bits  x,y,  Ix=y  indicates  if  x  =  y,  i.e., 


Ix=y  =  1 

:  x  =  y. 

Ix=y  =  0 

:  xf^y. 

e  [l,.^f],  let  us  denote  f^{X)  =  I^{X) 

( 

II 

o 

^(oi)W  =  l^ 

1+1 

4W  =  1- 

Finally,  we  introduce  the  following  theorem  and  its  corollary  that  we  use 
throughout  this  thesis.  We  refer  to  both  as  Hoeffding  bound. 

Theorem  7.1.  (Hoeffding  [1963])  IfXi^X2i .  ■ .  ^Xk  are  independent  and  ai  <  Xi  < 
bifJi  =  1,2, . . .  and  ifX  =  (Xi  +  . .  .  +  Xk)/K  and  p  =  E[X],  then  for  t  >0 

Fr[X-p  >t]<  . 


We  also  use  the  following  bound  for  the  distribution  function  of  the  difference 
of  two  sample  means. 

Corollary  7.1.  (Hoeffding  [1963])  lfY\^...,Yn,  Zi , . . . ,  are  independent  ran¬ 
dom  variables  with  values  in  the  interval  [a,b],  and  if  Y  —  {Yi  .  -\-Ym)/m, 
Z  =  (Zi  +  . . .  +  Z„)/n,  then  for  t  >  0 

Pr[F-Z-(E[?]-E[Z])  >t] 
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7.2  Every  Edge  Has  a  Perfect  Score 


In  this  section,  we  prove  a  theorem  regarding  a  score  that  we  assign  to  each  pair 
of  individuals  based  on  their  genotype  data.  Given  a  large  enough  K,  in  particular, 
K>a{\nN/f),  we  show  that  scores  between  people  from  the  same  population  are 
consistently  lower  than  scores  between  people  from  different  populations.  Using 
this  score,  we  can  construct  a  complete  graph  where  nodes  are  individuals  and 
edge  weight  is  the  score  between  two  individuals. 

In  particular,  we  call  this  score  Pscore(X,y).  For  an  unordered  pair  of  indi¬ 
viduals  {X,Y), 

Definition  7.5. 


K 


K 


Pscore{X,Y)  =  ^  Pscore‘{X,Y)  =  Pscore' 

1=1  1=1 


A  A2 

y\  yi 


where 


Pscore‘{X,Y) 


1 

2 


(7:j=a:2  ^>’'(=>’2)  (Tc' =y‘  “h  7:2=>’2^~^  | 

(7:j  =X2  ”1”  ^y\  =^2  )  ~  =y2  ^  / 


Note  that  this  definition  utilizes  a  special  quartet  construction  involving  four 
bits  x\,X2^y\^y2  that  are  four  independent  Bernoulli  random  variables  such  that 
two  bits  from  each  pair  (xj,xy,  (y'i,yy  are  identically  distributed. 

Table  7.2  shows  the  scores,  where  the  top  row  denotes  while  the  first 

column  denotes  {y\y2)',  we  only  define  scores  for  01  since  01  and  10  are  equivalenf 
as  an  unordered  pair  fo  Pscore'(X,T). 


00 

11 

01 

00 

0 

2 

0 

11 

2 

0 

0 

01 

0 

0 

-1 

Table  7.1.  A  Table  Illustrating  Pscore',V/  Given  Any  Four  Bits 
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Lemma  7.1.  IfX,Y  come  from  different  populations  and  Zi,Z2  are  of  common 
origin,  then, 


E[Pscore{XJ)]  =  2Ky, 
E[Pscore(Zi,Z2)]  =  0. 


Proof  We  first  define  5^  =  {pi)^  +  {q[f  and  5^  =  p[p^  +  q^fy,  where  q[^l-iY^ 
and  q'y  =  \—  p‘y,  and  henee 


E  =j:^  +  “E  7;t^=y‘ ) 

5'+5;,-25i, 

2{pi-p‘y)\  (7.2.7) 


Given  that  X,  Y  eome  from  different  populations,  we  apply  (7.2.7), 


K 


E[Pscore(Z,T)]  =  ^E[Pscore'(Z,T)] 

(7.2.8) 

1  ^  r 

—  2  ^  ^  h''i=y‘2  ~ 

^  i=i 

^v,=y‘  +4(=yy 

)]  + 

1  ^  r 

2  IL  ^  4'i=4  +  h\=y‘2  ~ 

^  i=i 

II 

+ 

II 

►2? 

)]  (7.2.9) 

=  2'£{p\-p^f  =  2Kr 

(7.2.10) 

i=l 


For  Zi,Z2,  V/,  P2i  —  Pz2’  hsrice  E[Pscore(Zi,Z2)]  =0.  □ 

The  following  theorem  does  not  assume  that  the  sample  eontains  the  same  num¬ 
ber  of  points  from  eaeh  population. 

Theorem  7.2.  Let  the  sample  size  be  2N.  Given  that  K  >  ISlnAf/y^,  with  proba¬ 
bility  1  —  0{l/N^),  for  any  four  individuals  X^Y^Zi^Z2  such  thatX^Y  come  from 
different  populations  and  Zi,Z2  come  from  the  same  population. 


Pscore{X,Y)  >  Pscore(Zi,Z2). 
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Proof.  We  first  use  Hoeffding  bound  to  prove  the  following  lemma. 

Lemma  7.2.  Given  that  K  >  ISlnA/^/y^, 

Pr [PscoA-e(Zi ,  Z2 )  >  iSfy]  <  1  , 

Fr[Pscore{X,Y)  <Ky\  < 

Proof.  Given  that  E[Pscore(Zi,Z2)]  =  0  and  Pscore(Zi,Z2)  = 
Pscore'(Zi,Z2)  is  the  sum  of  K  independent  random  variables  with 
values  in  [—1,2],  using  Hoeffding  bound  as  in  Theorem  7.1  with  t  =  Ky/K  =  y, 

Pr[Pscore(Zi,Z2)  >  Ky]  =  (7.2.11) 

Similarly,  given  that  E[Pscore(Z,T)]  =  2Ky, 

Pr[Pscore(Z,T)  <.^fy]  = 

Pr[-Pscore(Z,T)  +E[Pscore(Z,T)]  >  Ky]  (7.2.12) 

=  g-2/r2(y)2//r(3)2  ^  (7.2.13) 

□ 

By  union  bound,  the  probability  that  any  event  of  type  Pscore(Z,T)  <  .^fyor 
type  Pscore(Zi,Z2)  >  .^fy  happens  is  at  most  4N^/N^,  since  the  total  number  of 
such  events  are  2N{2N  —  1).  Hence  the  theorem  holds.  □ 

7.3  How  to  Learn  Which  Side  to  Join? 

In  this  section,  we  study  the  following  problem:  Assume  that  we  have  separated  2N 
individuals,  N  from  each  population  of  origin,  and  we  are  now  given  a  new  node  X 
that  we  need  to  place  on  the  correct  side  according  to  its  population  of  origin.  First 
recall  that  we  use  X  to  represent  the  individual  X’s  genotype  data  over  its  K  loci. 
Hence  X  =  {{xj  ,x|},  V^}  is  a  sequence  of  K  pairs  of  unordered  bits,  which  we  also 
refer  to  as  the  bit  string  of  X  loosely. 

We  prove  the  Local  Optimum  Lemma  as  follows.  Given  a  fixed  individual  X 
with  a  certain  bit  string,  and  its  N  peers  from  each  population:  Xi,...,Xa?  and 
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yi,...,F)v,  we  show  that  given  a  large  enough  K,  i.e.,  given  enough  loei,  whieh 
is  parameterized  over  N  and  y,  with  high  probability, 

N  N 

Pscore(X,y;)  >  Y  Pscore(X,X,).  (7.3.14) 

Thus  we  ean  put  X  on  its  own  population  side  given  that  all  of  its  2N  peers 
have  been  plaeed  eorreetly. 

Fix  X  to  be  X.  Let  Pscore(X,Z)  denote  Pscore(X,Z|X  =  X).  The  idea  of 
the  proof  is  that  while  Pscore(Z,Z,),  Pscore(Z,y,),V/  are  all  dependent  on  ran¬ 
dom  variable  X,  they  beeome  mutually  independent  onee  we  fix  X  to  an  arbitrary 
bit  string  X  that  we  eould  possibly  observe.  In  other  words,  random  variables 
Pscore(Z,F,),  V/  =  1, . . .  ,77,  and  Pscore(Z,Z, •),)//  =  1, ...  ,77  are  eonditionally  in¬ 
dependent  given  X  being  fixed  to  X.  This  allows  us  to  use  Hoeffding  bound  to  prove 
the  Loeal  Optimum  Lemma  as  in  Theorem  7.3. 

We  define  the  following  random  variable  diff(Z)  sueh  that  diff(Z)  repre¬ 
sents  the  expeeted  differenee  of  two  eonditionally  independent  random  variables 
Pscore(Z,})),  Pscore(Z,Z;),  eaeh  of  whieh  depends  on  either  T,-  or  Xi  given  that 
X  is  fixed  to  X.  Henee  expeetations  are  taken  over  all  possible  realizations  of  T„  Z, 
respeetively. 

Definition  7.6.  Let  X  ^  pi  be  a  node  from  Pi  and  Y  ^  p2  be  a  node  from  P2.  Let 
Z\  ~  pi,  Z2  ~  P2  be  two  nodes  randomly  drawn  from  Pi  and  P2  respectively. 

diff{X)  =  ^Z2r^p2  [PscoA-e(Z,Z2)]  -  [Pscore(Z, Zi)], 
diff{Y)  =  [Pscore(T, Zi)]  -  Ez^..^^  [Pscore(T, Z2)]. 

Remark  7.1.  Hence  diff{X)  is  determined  by  node  X’s  bit  string  X.  diff{X)  is  a 
random  variable  that  is  completely  determined  by  the  outcome  X  ofX.  For  a  fixed 
outcome  X,  diff{X)  is  a  fixed  value. 


Proposition  7.1.  Va  =  1,2,  [diff{X)]  =  2Ky. 
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Proof.  W.l.o.g.,  assume  that  X  ~  for  X  ~  p2,  proof  is  similar. 

[diff(X)]  = 

Ex-p,  [Ez2~p2  [Pscore(X,Z2)]  -Ez,^;5i  [Pscore(X,Zi)]] 
=  ^x^puZ2'-P2  [Pscore(Z,Z2)]  -  Ex,./;,,z,~pi  [Pscore(Z,Zi)] 
=  2/:y, 


where  the  last  step  is  due  to  Lemma  7.1.  □ 

We  show  that  when  K  is  hig  enough,  whieh  is  parameterized  over  N  and  y,  with 
high  probability,  the  observed  bit  string  X  is  eonforming  to  what  we  expeet  to  see; 
speeifieally,  this  is  refleeted  in  the  bounded  deviation  of  diff(Z)  from  its  expeeted 
value  2K^,  given  X.  Indeed, 

Lemma  7.3.  Given  that  K  >  'Pr[diff{X)  >  Ky]  >  1  —  x. 

Proof.  Given  X,  we  use  the  following  indieator  random  variables 

as  defined  in  Definition  7.2,  where 

X^  =  x\x\  denotes  the  unordered  pair  of  bits  observed  at  loeus  K  in  X. 

We  first  show  the  following  elaim. 

Claim  7.1.  LetX  be  any  sample  point.  Va  =  1,2, 

EY...p^[Pscore{X,Y\X^X)]  = 

2 1  ihixMf  +  ^t  Ux){pif  -  2  f 

k=l  k=\  k=\ 


Proof.  It  is  straightforward  to  verify  that  for  eaeh  loeus  i. 


Pscore'(Z,L)  =  < 


-1 
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Therefore  we  have 


Ey^p^[Pscoye{X,Y\X=X)] 

K 

=  Pscore^(X,T|X=l) 

k=l  ^ 

=  2  Y  {In  {X){qaf+lL{X){Paf  -  ^lL){X)2Paqa) 

=  2  f  /fi  (X)  {ctf+iY  lUx)  {pif-2Y  Im  {X)Paqa- 

k=l  k=\  k=\ 


□ 


We  next  use  Claim  7.1,  and  funetion  f^{X)  =  Iqq{X)  — as  speeified 
in  Definition  7.4,  to  derive  (7.3.15)  for  X  ~  pi,  and  (7.3.16)  for  Y  ~  p2,  where 


Pk  =  {P2f-{p\f^ 

Wk  =  {q2f-{q\f, 


—  {P2q2~Piq2)^ 

^  =  2  ^(P2  “7^i)(7^2+7^1  “  1); 


k=\ 


diff(X)  =  [Pscore(X,Z2|X  =X)]  [Pscore(X,Zi|X  =  1)] 


K 


K 


2  Y  In  {X)^'^  +  2  Y  4(2^)p'  -  2  Y  4i) (2^)®') 

k=l  k=l  k=\ 

K  K 

^Y{p\-p\)  (4  {X)  -  4  {X)) +2  Y  (4  -p\){P2+p\-^) 


k=l 

K 


k=\ 


=  2Y{p\-p\)f{X)+S,  (7.3.15) 

k=\ 

diff(T)  =  Ez,^^i  [Pscore(T,Zi|T  =  T)]  -Ez^^^^  [Pscore(T,Z2|T  =  T)] 

K  K 

=  22^(pi  - P2){Im{Y)  ~ In{Y))  +  2Y{Pi  ~ P2){P2  +  p\~  1) 


k=i 

K 


2Y{p\-P2)fHY)-S. 


k=i 


(7.3.16) 


114  •  Routing,  Disjoint  Paths,  and  Classification 

Let  us  define  and  by  proposition  7.1,  (7.3.15)  and  (7.3.16), 

2j^{p\-p\)f{X)  =  pf  (7.3.17) 

k=i 

=  Ex..p.[diff(x)]-5 
=  (7.3.18) 

^Y^p,  2j^{p\-p\)f\Y)  =  Ey..;,Jdiff(y)]  +  5 

k=i 

=  2K^+S.  (7.3.19) 

We  now  show  the  following  elaims  using  Hoeffding  bound;  we  only  show  proof 
for  Claim  7.2,  sinee  proof  for  the  other  one  is  similar. 

Claim  7.2.  Given  that  K  > 

Vvx^p,  Y^2{p\-p\)f\x)-pf<-K^  <x. 

_k=l 

Claim  7.3.  Given  that  K  > 

Vvy^P,  Y^2{p\-  p\)f\Y)-{2K^+S)  <-K^  <  X. 

U=i 

Proof  of  claim  7.2:  Given  that  eaeh  observed  bit  in  X  is  an  independent  Bernoulli 
random  variable,  /^(X),/^(X),...,/^(X)  are  independent  and  —2^ffk  <  '^{Pi  ~ 
p\)f^{X)  <  2^/%,  where  jk  —  {P2  ~  Pi)^yk  then  for  t  —  Ky/K  —  y, 

Prx^p,  Y^2{p\-p\)f\x)-pf<-Ky  <  <  x. 

U=i 


□ 
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These  claims  immediately  imply  Lemma  7.3,  since 


■  K 

Prx 

I 

Ji=\ 

Prx 

'  K 

Pry 

I 

k=l 

Pry 

~P2 

k=\ 


f  2{p\-p\)f{Y)  -  {2K^+S)  <  -Ky 

k=\ 


<  X. 


□ 


W.l.o.g.,  we  assume  that  X  ^  pi.  We  have  shown  that  there  is  a  significant 
difference  in  the  expected  values,  given  enough  multilocus  genotype  data  (e.g., 
SNPs);  that  is,  with  high  probability,  we  will  observe  a  bit  string  X  of  node  X  ~ 
such  that  V/  =  1, . . .  ,N, 


[Pscore(X,}^|X  =  X)]  -  Ex,^p^  [Pscore(X,X,|X  =  X)]  >  Ky,  (7.3.20) 

due  to  the  bounded  amount  of  deviation  in  random  variable  diff(X)  from  its  ex¬ 
pected  value,  when  evaluated  at  X.  A  similar  statement  holds  for  Y  ~  p2. 

We  are  ready  to  show  the  theorem  of  this  section.  The  theorem  shows  that  with 
high  probability,  we  can  place  a  node  X  in  the  correct  side  given  enough  number  of 
loci  and  N  random  peers  from  each  population.  In  particular,  the  failure  probability 
comes  from  either  X  being  a  bad  node  such  that  diff(A)  <  Ky,  or  from  a  certain 
large  deviation  event  for  the  sum  of  2KN  conditional  independent  random  variables 
as  shown  in  the  proof. 

Theorem  7.3.  Let  K  >  max {  }.  For  any  X  ^  pi  and  its  observed 

bit  string  X,  and  2N  individuals  X,-  ~  ~  p2,yi=  1 , ...  ,77  that  are  randomly 
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drawn  from  their  populations  of  origin,  with  probability  1  —  X  —  5, 

N  N 

^  Pscore{XJi\X  =  X)  >  £  Pscore{X,Xi\X  =  X). 

A  similar  statement  holds  for  Y  ~  p2. 

Proof  Given  an  individual  X  and  its  observed  bit  string  X,  we  first  define  2NK  eon- 
ditional  independent  random  variables,  sueh  that  V/  =  1, . . .  ,A^,  Pscore(y,-,X|X  = 
X)  and  Pscore(X,,X|X  —  X)  eaeh  eontribute  K  eonditional  independent  random 
variables  that  are  also  eonditional  independent  with  respeet  to  all  other  (2N  —  1)K 
sueh  random  variables. 

Let  y!^  =  Pscore^(X,y,|X  =  X)  and  Zf  =  Pscore^(Z,Z,|Z  =  X)  be  the  2NK 
eonditional  independent  random  variables  with  values  in  [—1,2],  and 

N  K 

y  =  II  Pscoxe^  {X  Ji\X  =  X)/NK,  (7.3.21) 

i=lk=l 
N  K 

y  -  II  Pscoxe\x,Xi\X  =  X)/NK.  (7.3.22) 

i=lk=l 


Claim  7.4.  Given  that  K  >  and  a  particular  bit  string  X  for  node  X  ^  p\, 

with  probability  1  —  X, 


E[F]-E[Z]  >Y. 

Proof  Using  Lemma  7.3,  we  have  diff(Z)  >  K^,  with  probability  1  —  X,  given  that 

f'  \  81n(l/T) 

-  y  ■ 

Henee  given  that  individuals  Z,  ~  ~  Pii'^i  —  1,---,A^,  are  randomly 

drawn  from  their  populations  of  origin,  we  have  with  probability  1  —  X, 

E[F]-E[Z]  =  E[(F-Z)] 

1  ^ 

=  ^lE[(Pscore(Z,F|2f  =  l)-Pscore(Z,Z;|Z  =  l))] 

1  ^ 

=  ^Iciiff(z)>iv.s:Y/iV^  =  Y. 

i=i 
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□ 


Assuming  that  we  indeed  have  observed  a  bit  string  X  sueh  that  E[y]  —  E[Z]  > 
Y,  we  apply  Corollary  7.1  of  Theorem  7.1;  Given  that  t  =  E[F]  —  E[Z]  >  y  and 


—  ’ 


Pr 


N  N 

^  Pscore(A, =  1)  -  £  Pscore(A, A;|A  =  A)  <  0|E[?]  -  E[Z]  >  y 

1=1  i= 1 


=  Pr[F  -  Z-  (E[F]  -E[Z])  <  -(E[F]  -E[Z])] 


=  Pr[-(F-Z)  +  (E[F]-E[Z])  >  (E[F]-E[Z])] 

<  <  cW/aT 


<  5. 


Thus  the  total  probability  of  a  bad  event 


Pr 


N 


i=[ 


N 

I 

i=l 


^  Pscore(A,Fii^  =  1)  -  ^  Pscore(A,A,|A  =  1)  <  0 


Pr[E[F]-E[Z]  <y]  + 

N  N 

^  Pscore(A, Fii^  =  1)  -  ^  Pscore(A, A;|A  =  A)  <  0|E[F]  - E[Z]  >  y 


Pr 

l_i=i 

<  x  +  5. 


/=! 


□ 

Corollary  7.2.  Let  K  >  max  { For  any  X^pi  and  its  observed  string 
X,  with  probability  1  —  0{\jN^),  given  that  Xj  ~  pi,F  ~  .P25V/  are  individuals 
randomly  drawn  from  their  population  of  origin,  we  have 

N  N 

^  Pscore{X ,Yi\X  =  1)  >  £  Pscore{X,Xi\X  =  1). 

/—I  /—I 

A  similar  statement  holds  for  Y  ~  p2. 
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8  Recognizing  a  Perfect  Partition 


8.1  Introduction 

In  this  chapter,  we  aim  to  classify  a  balanced  input  instance  and  derive  a  tighter 
bound  on  K  based  on  the  Pscore  that  we  define  in  section  7.2  and  the  complete 
graph  that  we  construct,  where  nodes  are  individuals  and  edge  weight  is  the  score 
between  two  individuals.  Let  P\  represent  the  set  of  nodes  ,2^2,  ...,Xn  from  pop¬ 
ulation  1,  and  P2  represent  the  set  of  nodes  Li ,  12,  ■  •  • ,  Lv  from  population  2.  Recall 
that  a  cut  (5, 5)  refers  to  the  set  of  edges  with  exactly  one  endpoint  in  S.  We  define 
Pscore  for  a  cut  (5,5)  as  the  sum  of  Pscores  over  the  set  of  edges  in  (5, 5).  When 
we  say  Pscore  over  an  edge  e  —  (m,v),  we  refer  to  Pscore(M,v). 

Consider  a  balanced  cut  (5,5),  as  shown  in  Figure  8.1.1,  where 

5  =  {WePi,/=l,...,iV-L,F,eP2,7  =  l,...,L},  (8.1.1) 

5  =  {YieP2,i^l,...,N-L,UjePiJ^l,...,L},  (8.1.2) 

and  L  e  [l,A^/2]  is  the  number  of  nodes  that  have  been  swapped  from  one  side  of 
T  to  the  other,  by  definition, 

N-LN-L  L  L 

Pscore(5,5)  =  ^  ^  Pscore(X;,Fy) -H  II  Pscore(f/,-,F;)  + 

i=l  j=l  i=^j=^ 

N-L  L 

^  Y,{Pscoye{Xi,Uj)  +  Pscoye{Yi,Vj)),  (8.1.3) 

(=1  ;=1 
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which  defines  Pscore(T)  when  L  —  0,  i.e., 


N  N 


Pscore(T)  =  Pscore(X,-,7y). 

1=1  7=1 


(8.1.4) 


A 


Figure  8.1.1.  Edges  that  are  different  between  a  perfect  partition  T  and  another 
balanced  partition  {S,S),  seen  only  from  f/i  ~  and  p2',  red  dotted  edges  are 
in  T  and  green  solid  edges  are  in  (5, 5). 

It  is  easy  to  verify  that  in  expectation,  the  perfect  partition  has  the  maximum 
Pscore,  i.e.,  V  balanced  (S,5)  other  than  T,  E[Pscore(T)]  >  E  [Pscore(S,S)] . 
Furthermore,  the  following  theorem  says  that  this  is  also  true  with  high  probability, 
given  a  large  enough  K.  Formally, 

Theorem  8.1.  Given  that  K  =  and KN  =  where N  >S,  with 

probability  1  —  1  /poly  (A^),  for  all  other  balanced  cut  (S,  S)  in  the  complete  graph 
formed  among  2N  nodes,  we  have 


Pscore{^)  >  Pscore{S,S). 


Remark  8.1.  When  N  =  D.{loglogN /y),  i.e.,  when  we  have  enough  individuals 
from  each  population,  K  =  becomes  the  only  constraint. 
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8.1.1  The  Approach 

We  compare  each  balanced  cut  (S,  S)  against  the  perfect  partition  T,  and  define 
a  random  variable  diff(T,  (5',5'),L)  as  in  (8.1.5)  to  capture  their  difference.  Fig¬ 
ure  8.1.1  shows  the  nodes  that  we  refer  to  in  (8.1.6), 

diff(T,  (5',5),L)  =  Pscore(T)  -  Pscore(5',5)  (8.1.5) 

L  N-L 

=  Z  L  Pscore (F; , Xi)  -  Pscore {Vj ,  F)  + 

'=1 
L  N-L 

^  ^  Pscore(17y,F;)  -  Pscore(17j,X,).  (8.1.6) 

j=\  i=\ 

Thus  for  a  particular  balanced  cut  (5,  S),  diff(T,  (S,S),L)  >  0  immediately  im¬ 
plies  that  Pscore(T)  >  Pscore(5',5).  And  this  is  what  Theorem  8.1  aims  to  prove 
for  all  balanced  cuts. 

In  more  detail,  we  refer  to  A,-  e  (SnPi)  and  Yi  e  (5nP2),  V/  €  [1,A  —  L]  as 
unswapped  nodes,  since  they  belong  to  the  majority  type  in  their  own  side;  we 
denote  Vj  €  {Sf^P2)^Uj  €  (5nPi),Vj  €  [1,L]  as  swapped  nodes  since  they  are  the 
minority  on  the  their  new  side. 

The  random  variable  diff(T,  (5',5'),L), VA/2  >  L  >  1,  comprises  exactly  of 
Pscores  over  the  set  of  edges  that  differ  between  those  in  T  and  those  in  (5,5), 
which  is  exactly  the  set  of  AL{N  —  L)  edges  between  swapped  nodes  and  unswapped 
nodes,  among  which  4{N  —  L)  edges  are  shown  in  Figure  8.1.1. 

In  particular,  for  (5,5),  as  shown  in  Figure  8.1.1,  original  cut  edges 
{{Vj^Xi),{U j^Yi)y j  e  [1,L],V/  €  [1,A  — L]}  that  belong  to  T  are  replaced  with 
{(Fy,}j),(17j,A,),V7  e  [1,L],V/  e  [1,A  — L]},  which  are  the  new  edges  that  appear 
in  (5,5);  these  new  edges  together  with  the  set  of  common  edges  that  belong  to 
T  n  (5,5)  form  (5,5).  Hence  we  only  need  to  consider  the  influence  of  2NK  ran¬ 
dom  pairs  of  bits  over  these  two  sets  of  edges,  as  shown  in  (8.1.6),  V(5, 5). 

In  particular,  observe  that  all  random  variables,  cliff(T,  (5,5),L),  V(5, 5),  VL  > 
0  have  positive  expected  values,  as  in  Proposition  8.1,  as  their  initial  advantage', 
thus  we  need  to  show  that  the  deviation  of  each  random  variable  from  its  expected 
value  is  less  than  the  expected  advantage  with  high  probability. 
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Proposition  8.1.  E[diff{‘T,  (S'jS'jjL)]  =  4L{N  —  L)Ky,  where  expectation  is  over 
all  K  random  pairs  of  bits  ofXi^Yi^Mi  E  [1,A^  — L]  and  E  [1,1.]. 

We  include  in  the  next  section  some  preliminaries  on  probability  theory  due  to 
our  intensive  use  of  these  terms  in  this  chapter. 

8.2  Preliminaries  On  Probability  Theory 

Most  of  the  following  definitions  come  from  the  textbook  Randomized  Algorithms 
by  Motwani  and  Raghavan  [1995].  Some  others  come  from  a  paper  by  Chung  and 
Lu  [2006]. 

In  all  definitions  below,  we  shall  be  thinking  of  some  sample  space  Q.,  and 
when  we  speak  of  the  complement  of  A,  denoted  as  A'^,  we  mean  all  those  elements 
of  Q  which  are  not  the  elements  of  A. 

First  recall  that  a  a-field  is  the  following. 

Definition  8.1.  A  o-field  (fl,  jF)  consists  of  a  sample  space  Q  and  a  collection  of 
subsets  of  Q.,  denoted  as  jF,  satisfying  the  following  conditions. 

-  0e  IF. 

-  If  A  E  IF,  then  A^  E  9^. 

-  If  A  \,A2,  ...  is  a  sequence  of  elements  of  J-  then 

\JA,  €  ? 

j 

Definition  8.2.  Given  a  o-field  (fl,  jF),  a  probability  measure  Pr  :  jF  — >■  M'*'  is  a 
function  that  satisfies  the  following  conditions. 

-  VAejF,0<Pr[A]<l. 

-  Pr[f2]  =  1. 

-  For  mutually  disjoint  events  Si , S2, . . . ,  Pr[U;S;]  =  L-Pr[S;]. 

Definition  8.3.  A  probability  space  (fl,  ]F,Pr)  consists  of  a  o-field  (12,  jF)  with  a 
probability  measure  Pr  defined  on  it. 
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Definition  8.4.  Given  the  o-field  with  ‘J  —  2^,  a  filter  F  is  nested  sequence 

iFo  O  C  . . .  C  of  subsets  of  2^  such  that 

-  n  = 

-  %  = 

-  for  0  <i  <n,  (H,  fi)  is  a  o-field. 

Definition  8.5.  Ifzi  ,£2, . . .  are  disjoint  events  that  partition  Q,  then  an  event  is  in 
the  generated  o-field  fi-  if  and  only  if  it  can  be  expressed  as  a  union  of  some  subset 
of  the  events  £1 ,  £2, . . we  refer  to  £1 ,  £2, . . .  as  the  elementary  events  in  the  o-field 

IF. 

Remark  8.2.  An  intuitive  view  of  Definition  8.4  can  be  obtained  by  associating 
with  each  fit  a  partition  ofQ.  into  blocks  •  such  that  the  events  Bl  generate 

the  o-field  jfi.  Furthermore,  the  partition  associated  with  is  a  refinement  of 
partition  associated  with  jfi,  and  fo  is  generated  by  the  trivial  partition  while  jF„  is 
generated  by  the  partition  ofQ.  into  the  singleton  sets  containing  the  sample  points. 

Definition  8.6.  (Alon  and  Spencer  [1992])  A  martingale  is  a  sequence  Aq,  . . . 
of  random  variables  so  that  for  0  <i  <n, 

E[Xi+i\Xi,Xi_i,...,Xo]=Xi. 

Finally,  for  the  sake  of  eompleteness,  we  adopt  the  following  definitions 
from  Chung  and  Lu  [2006]  in  order  to  introduee  Definition  8.9,  whieh  is  equiv¬ 
alent  to  Definition  8.6  for  the  finite  eases. 


Definition  8.7.  If  f  :  Q.  ^  M  is  a  function,  we  define  the  expectation  E[/]  = 
E[/(x)|x  e  D]  by 
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Definition  8.8.  If  J-  is  a  o-field  on  Q.,  we  define  the  conditional  expectation 
E[/|jF]  ■=Q.^nby 


E[/|1F]W 


1 

Prb] 


I  /(3')Prb], 

yefix) 


where  (x)  is  the  smallest  element  of  that  contains  x. 

Definition  8.9.  A  martingale  obtained  from  random  variable  X  is  associated 
with  a  filter  F;  jFo  C  C  . . .  C  =  2^  and  a  sequence  of  random  variables 
Xo,Xi,...,Xm  satisfying 


Xi  =  E[X\d^i], 


(8.2.7) 


and  in  particular,  Xq  =  E[X]  and  X^  =  X. 

8.3  Proof  Techniques  and  Some  Notation 

We  first  introduce  some  notation  regarding  the  simple  probability  space  (Q,  jr,Pr) 
as  follows.  The  set  Q.  is  the  set  of  all  possible  outcomes  for  2NK  pairs  of  random 
bits,  where  we  denote  each  pair  with  b{j,k)  for  individual  j  at  position  k.  The  a- 
field  jF  of  events  is  the  set  r(Q)  of  all  subsets  of  Q;  and  the  probability  measure 
Pr  is  based  on  the  product  of  probabilities  of  each  pair  of  random  bits  b{j,k),yj,k 
corresponding  to  Bernoulli(p^),  where  a  G  {1,2}  depends  on  the  population  of 
origin  for  individual  j.  Formally, 

Definition  8.10,  The  elementary  events  in  the  underlying  sample  space  (fl,  jr,Pr) 
are  all  possible  4^^^  choices  of  n  =  2NK  pairs  of  bits.  For  0  <  i  <  n  and  w  G 
{00,01, 10, 11}',  let  denote  the  event  that  the  first  i  pairs  of  bits  equal  to  the  bit 
string  w.  Let  jT)'  be  the  o-field  generated  by  the  partition  of  Q.  into  blocks  B„,  for 
w  G  {00,01, 10, 11}'.  Then  the  sequence  jFo, . . . ,  informs  a  filter.  In  the  o-field  Ji, 
the  only  valid  events  are  the  ones  that  depend  on  the  values  of  the  first  i  pairs,  and 
all  such  events  are  valid  within. 


The  events  that  we  define  next  and  their  interactions  are  shown  in  Figure  8.3.2. 
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We  show  that,  with  high  probability,  all  of  the  0{2^^)  random  variables 
cliff(T,  (5',5'),L),  as  in  (8.1.5),  one  eorresponding  to  eaeh  balaneed  {S,  S),  are  posi¬ 
tive. 

What  we  do  is  the  following:  we  initially  eonfine  ourselves  into  a  good  sub- 
spaee  by  exeluding  any  bad  node  event.  We  then  use  union  bound  to  bound 
the  possibility  of  any  bad  score  event,  where  a  single  bad  score  event  oeeurs  when 
diff(T,  {S,S),L)  <  0  for  a  partieular  balaneed  cut  {S,S). 

Each  time  we  examine  diff(T, (5',5'),L)  for  a  particular  balanced  cut  (5,5), 
we  let  vector  {H\ , . . .  ,H2kn)  record  the  entire  history  of  random  unordered  pairs 
of  bits,  where  {Hi,. .  .,H2kl)  record  the  partial  history  of  unordered  pairs  for  the 
2L  swapped  nodes  corresponding  to  (5,5).  Let  £  =  2KL  be  a  positive  integer.  We 
denote  this  2.^rL-history  with  Let  ^  be  a  fixed  possible  .^-history. 

We  use  the  bounded  differences  method  to  bound  a  single  bad  score  event  over 
(5,5-):diff('r,(5,5),L)  <  0.  Our  starting  point  is  after  we  reveal  the  2KL  unordered 
pairs  and  obtain  a  2.^fL-history  h.  Lor  simplicity  of  analysis,  we  first  expand  the 
confined  subspace  given  h,  by  dropping  consfrainfs  on  fhe  2{N  —  L)  unswapped 
nodes.  In  fhis  expanded  subspace,  we  only  require  fhaf  fhe  firsl  2L  swapped  nodes 
are  good  nodes,  a  condifion  fhaf  we  denofe  wifh  'Ef(5,5),  while  leaving  fhe  re¬ 
maining  2{N  —  L)  unswapped  nodes  fo  lake  complelely  random  bifs  according  lo 
Iheir  dislribulions;  fhaf  is,  Ihese  nodes  can  be  bad  nodes.  We  Ihen  oblain  a  bound 
on  concenlralion  for  cliff(T,  (5,5),L)  in  fhis  expanded  probabilily  space  given  K 
Evenlually  we  map  fhe  probabilily  of  fhe  bad  score  evenl  fhaf  corresponds  lo  ran¬ 
dom  variable  diff(T,  (5,5),L)  from  Ibis  expanded  probabilily  space  back  lo  Ihe 
original  confined  subspace  given  h. 

Lor  now,  lei  us  firsl  inlroduce  some  nolalion  for  convenience  and  see  how  we 
expand  Ihe  subspace  given  a  parlicular  history  h. 
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We  call  the  remaining  2K{N  —  L)  unordered  pairs  as  the  2K{N  —  L)-future.  Let 
/  =  {H2kl+\  ,  •  •  ■tH2kn)  be  a  fixed  possible  2K{N  —  L) -future. 

Let  denote  that  event  that  we  observe  this  particular  2^L-history:  — 

{ti  e  =  h}.  Given  that  event  occurs,  we  are  concerned  about  the 

following  probability  space  ) ,  Pr^ ) ,  where  Pr^  is  the  probability  measure 

on  G/j.  Let  us  use  E/,  for  expectation  in  this  space.  Formally, 

Definition  8.11.  'Eh\ciiff{T^  ,{S^S)^L)\  ='E,\diff{T^  ^{S^S)^L)\J^2Kl\  is  the  expected 
value  ofdiff{‘T^  {S,S),L)  conditioned  on  an  event  J^2kl-  This  conditional  expec¬ 
tation  E[c//ff('r  ,  (5, 5) ,  L)  I  !]-2kl\  is  a  random  variable  that  can  be  viewed  as  a  func¬ 
tion  into  reals  from  the  blocks  in  the  partition  of  ^2kl-  Hence  E^  [c//ff(T,  (S'jS'jjL)] 
is  an  evaluation  of  this  conditional  expectation  at  a  particular  outcome  h  G  ^2kl- 

Thus  (flft,r(Gft),Prft)  corresponds  to  the  expanded  subspace  of  given  Iv, 
in  this  expanded  probability  space,  we  can  apply  the  bounded  differences  method 
to  analyze  probability  for  a  bad  score  event  on  diff(T,  {S.,S).,L)  for  a  balanced  cut 
(5,  S)  in  a  clean  manner. 

In  fact,  our  starting  point  of  the  bounded  differences  analysis  is 
Eft  [diff (T,  (5, 5) , L)] ,  where  ^  is  a  fixed  possible  2.^fL-history  that  we  record  while 
revealing  all  2KL  random  unordered  pairs  on  the  2L  swapped  nodes  for  (5,  S),  sub¬ 
ject  to  he  'Ef'(S,S).  This  immediately  indicates  that  the  conditional  expected  value 
Eft  [diff (T,  (5, 5), L)]  >  2{N  —  L)LKy,  which  is  our  “advantageous  base  point” 
given  that  occurs.  Now  as  we  reveal  one  by  one  the  future  2K{N  —  L)  random 
unordered  pairs,  the  conditional  expected  values  Eft  diff(T,  yi'  > 

2KL  form  a  martingale  that  is  amenable  to  the  bounded  differences  analysis. 

This  naturally  brings  up  the  second  bad  event  that  we  need  to  further  ex¬ 
clude  from  the  2.^fL-history  h,  while  examining  a  balanced  cut  (5,  S)  in  probability 
space  if .  'E|'  refers  to  the  event  that  large  deviation  occurs  simultaneously  across 
a  set  of  K  random  variables,  where  the  random  variable  is  defined  over  the  2L 
unordered  pairs  of  bits  at  locus  k  across  the  2L  swapped  nodes. 

Note  that  despite  the  additional  constraint  we  put  over  h  ^  f]  ‘Ef, 
Eft  [diff(T,  (S'jS'jjL)]  remains  lower  bounded  by  2{N  —  L)LKy,  given  that  h^i{ 
and  Eft  [diff (T, (5, 5), L)]  is  a  random  variable  whose  outcome  is  entirely  deter¬ 
mined  by  h  (see  Proposition  8.2). 
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However,  exeluding  'E|'  from  ji  is  erueial  in  bounding  the  differenee 
that  eaeh  of  the  2{N  —  L)K-future  random  pairs  of  bits  eauses  when  we 
work  in  probability  spaee  where  the  difference  refers  to 

Eft  [diff('r,(5,5),L)|^(^')  -Eft  [diff('r,(5,5),L)|^(^'-i)  ,  where  2KN  >  £'  > 
2KL  depends  on  the  partieular  pair  of  bits,  sueh  that  the  square  sum  of  all  these 
differenees  is  not  too  big.  This  allows  us  to  bound  the  probability  on  a  bad  seore 
event,  i.e.,  diff(T,  (5',5'),L)  <  0,  using  Azuma’s  inequality  in  probability  spaee 
(aft,r(aft),Prft),  given  that  Qft  oeeurs,  where  h^'E^n 

After  we  obtain  Prft  [diff(T,  (5',5'),L)  <  O]  in  probability  spaee 
(flft,r(flft),Prft),  where  ^  €  “Ef  n  Ef'  and  /  is  entirely  at  random,  we  ean 
caleulate  Pr^  [diff(‘E,  (5',5'),L)  <  O]  given  that  E\'i^E^  and  /  G  E^~^,  where 
E^~^  denote  the  event  that  the  2{N  —  L)  unswapped  nodes  eontain  no  bad  node 
event  either;  henee  the  latter  eonditions  imply  that  all  nodes  are  drawn  from  E^ . 

Sinee  Pr^  is  small,  its  inlluenee  on  Pr^  [diff(‘r,  (5',5'),L)  <  O]  is  small; 

that  is,  given  that  Qft  oeeurs,  where  hEEfn  EJ^,  Pr^  [diff(‘E,  (5',5'),L)  <  O]  re¬ 
mains  small  regardless  whether  /  stays  in  this  eonfined  future  subspaee  E^^^  or  is 
entirely  at  random  as  in  (Qft,r(Qft),Prft). 

Let  us  map  these  notation  to  what  we  have  defined  in  Seetion  7.3  regarding  the 
expeeted  differenee  of  two  eonditional  independent  random  variables  as  follows, 
where  expeetations  are  taken  over  all  2K{N  —  L)  random  unordered  pairs  of  bits  on 
Xiffiffi  after  fixing  swapped  nodes  Uj^Vjffj  €  [1,L]  for  fhe  given  balaneed  euf. 

Proposition  8.2.  We  work  in  probability  space  (fl,fF,Pr).  Eft  [c//ff(‘r,  (5',5'),L)] 
is  a  function  offf  Hence  Eft  [c//Tf(‘E,  (5',5'),L)]  as  a  random  variable  according  to 
Definition  8.11,  its  value  depends  on  random  unordered  pairs  on  the  2L  swapped 
nodes  that  we  record  in  h  =  {U\,. . .  . .  ,14}-' 

EH[ctiff{‘T,{S,S),L)]  =  Y^J^diff{Uj)  +  '£J^ctiff{Vj) 

j=l  i=l  7=1  i=i 

=  (N-L)'tt  2(^2  -  -  f(V,)), 

where  diff{U f)  and  diff{Vj)  are  defined  in  Definition  7.6. 
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Proof.  For  each  balanced  cut  (5,5),  we  could  pair  up  G  [1,//  —  L]  via  an 

arbitrary  matching  between  two  sets  of  unswapped  nodes.  Note  that  actual  choice 
of  i  for  Xi^Yi  does  not  influence  the  value  of  random  variable  diff(17y),  which  is 
uniquely  determined  by  the  K  unordered  pairs  of  bits  on  node  U j. 

Given  (8.1.5)  and  linearity  of  expectations,  we  have 

E,  [diff('r,(5,5),L)]  =  E[diff('r,(5,5-),L)|f/,-  =  Uj.Vj  =  €  [1,L]] 

L  N-L 

=  X  L  [Pscore(t7^',E)]  -E^.  [Pscore(Lfy,X,)]  + 

7=1  1=1 
L  N-L 

L  L  Ex,  [Pscore(F,',X,)]  -Ey,  [Pscore(Fy,E)] 

7  =  1  1=1 

L  N-L  L  N-L 

=  £  £diff(c;,)  +  £  £diff(v,) 

7=1  1=1  7=1  1=1 


L  K 


=  («-/.)££  2(/>l  -  p'iWHui)  -fm, 

7  =  U=1 

(8.3.8) 

where  the  last  equation  is  due  to  (7.3.15)  and  (7.3.16),  given  that  Vy,  U j 
Vj  ~  P2, 

~  Pi  and 

dmUj)  =  f  2(p|-p^)/(f/y)  +  5, 

k=l 

(8.3.9) 

diff(F,)  =  £2(pf-p^)/(F,)-5. 

(8.3.10) 

k=l 


□ 

Remark  8.3.  Hence  E^  [c//ff(T,  (5,5),L)],  as  a  function  of  is  evaluated  to  a 
unique  value,  i.e.,  the  value  of{N  —  L)  Ly=i  Lf=i  ^{P2~ eval¬ 
uated  at  outcome  {t/i, . . . ,  Gl^Fi,  •  •  •  ^Fl}- 

8.4  Excluding  Two  Bad  Events 

There  are  two  lemmas  regarding  two  types  of  bad  events  that  we  exclude  from  the 
set  of  all  possible  h  that  we  consider  in  order  to  apply  the  bounded  differences 
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analysis.  In  more  detail,  after  we  reeord  the  .^-history  h,  by  revealing  Uj,Vj,yj  G 
[1,L]  to  something  like  Uj,Vj,yj,  we  ean  observe  that  with  high  probability,  the 
history  li  exeludes  two  types  of  bad  events,  from  all  balaneed  {S,  S). 

-  The  first  bad  event  is  whieh  is  defined  in  Definition  8.13.  We  show  that 
by  exeluding  a  set  of  2N  bad  node  events  (Definition  8.12)  from  the  produet 
probability  spaee  (D,tF,Pr)  simultaneously,  eaeh  regarding  a  single  node’s 
behavior  aeross  its  K  unordered  pairs  of  bits,  does  not  happen. 

In  Seetion  8.4.1,  we  show  that  for  all  balaneed  eut  (5,  S),  when  evaluated  at 
a  partieular  history  ji  G  F2kl  that  is  determined  by  the  2KL  unordered  pairs 
on  the  2L  swapped  nodes  drawn  from 

Eft  [diff('r,  (5,5), L)|^G  Ef,/ at  random]  >2KL{N-L)y,  (8.4.11) 

in  an  expanded  subspaee  where  /  is  assumed  to  be  at  random,  given  h  G  Ef. 

-  The  seeond  type  of  bad  events  as  in  Definition  8.15,  is  regarding  si¬ 
multaneously  large  deviation  aeross  a  set  of  K  random  variables  defined 
over  2L  swapped  nodes  and  aeross  their  K  loei.  The  idea  is  that,  at  eaeh  lo- 
eus,  we  may  observe  eertain  large  deviation  from  the  expeeted  bit  pattern 
aeross  2L  individuals  in  the  sense  of  Definition  8.14;  however,  as  we  show 
in  Lemma  8.4,  with  high  probability,  sueh  deviation  aeross  all  K  loei  ean 
not  be  simultaneously  large  due  to  the  mutual  independenee  assumption  that 
we  make  aeross  K  loei.  We  use  union  bound  to  bound  over  all  balaneed 
(5,5"). 

(8.4.11)  still  holds  given  that  the  random  variable  Eft  [diff(‘E,  (5,5), L)]  is  a 
funetion  of  ^  G  E['  n  E^',  with  /  entirely  at  random. 

We  formally  define  these  two  events  with  the  following  definitions. 

Definition  8.12.  (Bad  Node  Event  E(Z))  Let  a  bad  node  event  E(Z)  be  the  event 
that  diff{Z)  <  Ky,  where  Z  is  one  sample  point.  Note  this  is  an  event  in  an  individual 
probability  space  (Dz,  iFzjPrz).  where  (Dz^^z^Prz)  is  defined  over  all  possible 
outcomes  for  K  random  unordered  pairs  of  bits  for  an  individual  Z. 
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Note  that  all  bad  node  events  are  mutually  independent. 

From  now  on,  we  use  (Q,-,  j7^-,Pr,)  to  refer  to  (flz,,  yZi,Prz,)  for  the  input  IN 
nodes,  assuming  that  we  are  given  a  unique  ordered  list  of  (Zi, . . .  ^Z2n)- 

Definition  8.13.  (Bad  Event  Ef)  “Ef  is  the  same  as  E(Zi)  U  . . .  U  'E{Z2n)  in 
the  product  probability  space  (fl,  jF,Pr)  composed  of  distinct  probability  spaces 
(fli,  jFijPri), . . . ,  lF2Af,Pr2Ar)  as  in  Definition  8.12.  Hence  is  the  same  as 
the  joint  event  'E(Zi)  fl . . .  fl  D{Z2n)  in  (fl,  lF,Pr). 

Let  (5,5)  denote  a  balaneed  eut  with  L  swapped  nodes  on  eaeh  side  for  some 
L  €  [l,N/2].  We  proeeed  to  define  given  a  balaneed  eut  (5,5).  We  first  define 
a  sef  of  K  random  variables  and  fheir  deviafion,  eaeh  regarding  2L  unordered  pairs 
aeross  fhe  2L  swapped  nodes  af  a  parfieular  loeus  Mk.  Again  lef  h  be  fhe  2KL- 
hisfory  fhaf  we  reeord  after  revealing  bifs  on  2L  swapped  nodes  in  (5, 5). 

Definition 8.14.  (Deviation  Values)  'ik=  let tts/L be  the  exaef  deviation 

of  the  following  random  variable  that  we  observe  over  h^  which  we  denote  with 
/|(/i),  i.e.,  /|(/i)  —  E[/|(^)]  =  tks/L,yk,  where 

fiih)  =  /|(f/i,...,f/L,Vi,...,VL) 

=  -  (4(^i)  -/n(V,')) 

j=i 

j=l 

and  /^(f/i), . . .  ,/^(f7i),/^(Vi), . . .  ,/^(Vl)  are  all  random  variables  in  range 
[—1,1],  as  defined  in  Definition  7.4. 

First  let  us  obtain  the  expeeted  value  of  f^Ol). 

Proposition  8.3.  For  f^ih)  as  in  Definition  8.14, 

e[/|(^)1  =  E  f  [(4(1/2) -4(1/;))- (4(^;)-4('6-))’ 

L/=i 

j=i  ;=i 

=  2L{pi-p\). 
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Next  we  bound  the  deviation  for  eaeh  random  variable  /|(^),  as  defined  in 
Definition  8.14. 

Lemma  8.1.  Mk,  for  random  variable  have 


Pr 


/l(^)-E  /l(^) 


>  tkVL\  < 


In  addition,  events  corresponding  to  different  loci  are  independent. 
Proof.  Let  us  define  random  variables  sueh  that 

f^{h)=L{u’^-V% 


(8.4.12) 


(8.4.13) 


and 


LfiUj)/L, 


j=i 


V’^ 


L 


/-I 


Thus  by  Proposition  8.3, 


E 


-E 


=  -E 
L 


f2m=2{p^2-p\)- 


(8.4.14) 


In  order  to  bound  probability  of  deviation  on  both  sides  of  the  expeeted  differ- 
enees,  we  let  t  =  tus/P/P  and  apply  Corollary  7.1  of  Theorem  7.1, 


Pr 


/|(^)-E  f^{h) 


>  tkVp 


Pr 


ijk-yk-  (g 


-E 


)  >  tks/PjP 


-2(t^VL/Lf 

<  2e  (2T)(2)^ 


(8.4.15) 

(8.4.16) 

(8.4.17) 


□ 

Definition  8.15.  (Bad  Deviation  Event  “Ef)  In  probability  space  (D,  )r,Pr),  given 
a  balanced  cut  (5,  S)  and  its  corresponding  2KP-history  ^  event  such  that 

the  set  of  random  variables  t\,...,tk  regarding  2KP  unordered  pairs  recorded  in  h^ 


134 


Routing,  Disjoint  Paths,  and  Ciassification 


as  defined  in  Definition  8.14,  are  simultaneously  large  and  satisfy 
K 

^  ^  =  32A/^ln2+  1 6^ In 2 (log log +  1)  +  61nA/^. 

k=\ 


Using  Definition  8.15  and  8.14,  we  immediately  have  the  following  lemma, 
whieh  we  use  in  Seetion  8.5. 

Lemma  8.2.  Given  that  h  E  Hf,  we  have  VL 


|/|(A)|  <  e[/|(a)1  +  ttVl 


and 

k=l 

where  tk  is  defined  in  Definition  8.14,  and  the  bad  deviation  event  “£2  is  given  in 
Definition  8.15. 

Proof.  By  definition  of  tuflk,  we  have  that  =  E[/|(A)]  +tkVL,  where  t).  E 

r-2L-E[/|(Ul  2L-E[/|(Uli 

L  vT  ’  vT 

Thus  we  immediately  have  |/|(A)  |  <  |E  [/|(A)]  |  +  Iftv/Ll,  where  Y,k=\ ^k  — 
given  that  h  E  □ 

We  are  now  ready  to  bound  the  probability  for  events  and  We  want 
to  emphasize  the  we  exelude  P^  onee  for  all  2N  nodes,  while  exeluding  one  P^ 
from  eaeh  balaneed  eut  (5,5),  where  L  denotes  that  the  event  P^  is  defined  over 
the  partieular  set  of  2KL  unordered  pairs  aeross  K  loei  on  the  2L  swapped  nodes 
in  (5,5);  we  have  (^)^  number  of  sueh  events  for  eaeh  L,  whose  probabilities  we 
sum  up  later  using  union  bound. 

Henee  for  all  that  we  deal  with  later  in  bounded  differenees  analysis,  neither 
of  the  two  types  of  bad  events  happen. 

Lemma  8.3.  LetK>  in  probability  space  (D,  lF,Pr),  Pr  <  pi  = 

Thus, 
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Proof.  Apply  Lemma  7.3  to  each  diff(Z)  with  x  =  1/A^^;  Given  K  > 
have  VZ, 


256  In /V 

y 


we 


Given  that  at  equilibrium,  each  node’s  bits  at  locus  k  are  two  independent  random 
draws  from  its  distribution  Bernoulli(p^),  where  a  E  {1,2}  depends  on  its  popu¬ 
lation  of  origin  for  node  Z,  we  adopt  the  view  of  composing  the  product  space 
(G,iF,Pr)  through  distinct  probability  spaces  (Gi,iFi,Pri), . . . ,  (f22w, lF2w,Pr2w) 
as  in  Definition  8.13,  where  (D,-,  f7>-,Pr,),  V/,  is  defined  over  all  possible  outcomes 
for  K  random  unordered  pairs  for  individual  Z,. 

Then  for  events  'E(Zi)  E  . . . , 'E(Z2/v)  E  f-iN,  the  probability  of  the  joint 
event  ('E(Zi), . . . , 'E(Z2/v)), 


Pr['E(Zi)n'E(Z2)n...n'E(Z2^)]  = 

Pn  ['E(Zi)]  •Pr2['E(Z2)]  •...•Pr2^['E(Z2;v)],  (8.4.18) 


where  the  product  corresponds  to  performing  independent  experiments  with  re¬ 
spect  to  each  of  the  2N  probability  spaces  (D,-,  t7j-,Pr,),  V/,  given  a  fixed  ordering 
of  (Zi, . . .  ,Z2A?)  on  the  input  nodes. 

Therefore  by  definition. 


Pr['Ef]  =  Pr[none  of  'E(Z)  happens,  for  all  nodes  Z]  (8.4.19) 

=  Pr['E(Zi)n'E(Z2)n...n'E(Z2/v)]  (8.4.20) 

=  Pri  ['E(Zi)]  •Pr2  ['E(Z2)]  •...•Pr2^  ['E(Z2iv)]  (8.4.21) 


=  (l-Pri['E(Zi)])-(l-Pr2['E(Z2)]) 

/  1  x2v  2A 

>  (1 - >1 - 

-  ^  A32''  -  iV32- 


(l-Pr2^['E(Z2iv)]) 

(8.4.22) 


□ 


Lemma  8.4.  In  probability  space  (f2,  ,  Pr),  for  each  balanced  cut  {S,S), 

Pr[^E  Ef]  <  P2, 

where  p2  =  0(2^'Vpo\y(w))  N>i. 
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Proof.  To  facilitate  our  proof,  we  obtain  a  set  of  nonnegative  numbers  (fi, . . .  ,4) 
as  follows;  Vfc,  to  obtain  4,  we  round  \tk\  down  to  nearest  nonnegative  number  |4| 
that  is  power  of  two. 

It  is  easy  to  verify  that  C  ^  Proposition  8.3. 

Thus  we  have  4  <  \tk\  <  |2-\/l|  +  . 

Let  us  divide  the  entire  range  of  \tk  into  intervals  using  power-of-2  non¬ 
negative  integers  as  dividing  points;  Let  r^,V^  represent  the  number  of  such  in¬ 
tervals:  we  have  Mk,  so  long  as  >  8, 

rk  =  log(|2v/L|  +  |2L(p^-p|)/v/L|)  (8.4.23) 

<  log4\/L<log4v/i^<logA^.  (8.4.24) 

Thus  we  have  at  most  (logA^^)^  blocks  in  the  A'-dimensional  space  such  that 
each  block  along  each  dimension  is  a  subinterval  of  0,  |2VL|+  ■ 

Let  B(Pi, . . . ,  Pit)  represent  a  block  in  the  ^^-dimensional  space,  where 
Pi, . . . ,  are  nonnegative  power-of-2  integers  and  every  point  in  B(Pi, . . . ,  Pa:)  has 
its  value  fixed  in  interval  [Pa:,2P;i)  along  dimension  k,  Vk;  hence  (Pi, ... ,  Pa,)  is  the 
point  in  the  A'-dimensional  space  with  the  smallest  coordinate  in  every  dimension 
inB(pi,...,p;i). 

A  set  of  values  (ti,...,tA:)  as  in  Definition  8.14  is  mapped  into  one  of  these 
blocks  uniquely  as  follows.  We  say  a  point  (ti,...,4)  maps  to  B(Pi, . . . ,  Pa:),  if 
Vk,2P^  >  \tk\  >  Pa:,  i-e.,  (4,..., 4)  =  (Pi, . . . ,  Pa:). 

We  first  bound  the  following  event  using  Lemma  8.5.  Let  us  fix  one  block 
B(Pi,. . .  ,Pa:)  for  a  fixed  sef  of  values  Pi, . . .  ,Pa:  such  fhaf  >  ^/4. 

Lemma  8.5.  Let  A/4  =  8Aln2  +  4A'(ln2)(loglogA  +  1)  +  (31nA) /2  as  A  is  de¬ 
fined  in  Definition  8.15. 

r  ^  »  1  1 

Pr  h  maps  to  a  particular  B(Pi, . . . ,  Pa-)  s.t.  y'  tl>  A/4  < - ^ - -. 

-  ^  ’  ^  J  “  22^-(logA)^-A3/2 

Proof.  Lef  tiv/T,  •  •  -fiks/P  be  fhe  deviafion  fhaf  we  observe  in  h  for  random  vari¬ 
ables  /2  (4)5/2  (^)5 5/2 (^)  Definition  8.14.  If  coordinates  {t\^...fk)  of  h 
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maps  to  (Pi, . . . , P^;),  we  know  that  Vk,2P;i  >  \tk\  >  P/t  given  the  definition  of 

B(Pi,...,p,). 

In  addition,  by  Lemma  8.1,  we  know  that 

Pr  [|/|(^)  -  E  [/|(^)]  I  >  P,v/l]  <  (8.4.25) 

and  events  eorresponding  to  different  loei  are  independent;  Thus  we  have 
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Thus  we  have 


'  K 

1 

£4^>a 

<  Pr 

M 

IV 

_k=l 

lk=i  j 

(8.4.30) 


Pr 


K 

^maps  to  some  B(Pi,. . .  jP/t)  s.t.  ^  Pf  >  A/4 

k=\ 


This  allows  us  to  upper  bound  Pr  ['E^']  with  events  regarding  Y,k=\  ^  follows: 


Pr  [E^  ]  =  Pr 


K 


n  (fiih)  -  E  A{h)  =  tkVl)  s.t.  £  4'  >  A 


,t:=l 


K 

L 

k=\ 


(8.4.31) 


< 


< 

< 


Pr 


K 

^maps  to  some  B(Pi,. . .  ,P;t)  s.t.  £  P|  >  A/4 

k=l 


{\ogNf 

22^-(logA)^-A3/2 

1 

22^  poly  (A)' 


(8.4.32) 

(8.4.33) 


Henee  the  probability  that  the  2KL  unordered  pairs  induee  simultaneously  large 
deviation  for  random  variables  /2  (/l); •  ■  •  5/2 (^)’  Definition  8.15,  is  at  most 

P2  =  ^(2™poly(V))‘ 

This  immediately  implies  the  following  eorollary. 


Corollary  8.1.  For  all  balanced  {S,S)  with  L,V0  <  L  <  A/2,  swapped  nodes  on 
each  side,  with  probability  1  —  1/  poly  (A)  in  probability  space  (D,  jF,  Pr),  the  set  of 
t\^...fktis  defined  in  Definition  8. 14  over  2KL  unordered  pairs  satisfy  Lf=i 


Henee,  we  have  an  advantageous  bounded  differenees  ease  as  we  enter  the 
expanded  probability  spaee  Q.h  by  exeluding  h  that  belong  to  or  for  all 
balaneed  {S,S). 


8.4.1  Preparation  for  Landing  in  Expanded  Subspaces 

We  first  give  two  definitions. 
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Definition  8.16.  “Ef  (5, S)  is  the  same  E(l/i )  U . . .  U  ‘E{Ul)  U  £(^1 )  U  . . .  U  E(Vl) 
in  the  product  probability  space  composed  of  distinct  probability  spaces  defined 
over  nodes  l/i , . . . ,  1/l,  Vi,.  . .  ,Vl  as  in  Definition  8.12. 

Definition  8.17.  S)  is  the  same  E(Xi )  U  . . .  U  ^{Xn-l)  U  E(Ti  )  U  . . .  U 

D.{Yn-i)  in  the  product  probability  space  composed  of  distinct  probability  spaces 
defined  over  nodes  Xi,...  ,Xn^l,  Yn-l  as  in  Definition  8.12. 

Hence  Ef'  and  ‘E^~^  imply  that  no  bad  node  event  happens  in  the  appropriate 
product  spaces  thus  defined.  We  omit  {S,S)  from  Ej^(5',5)  and  'E^~^{S,S)  when  it 
is  clear  from  the  context. 

Given  a  balanced  cut  {S,  S),  h  records  a  history  on  the  2KL  unordered  pairs  on 
swapped  nodes  Ui,..., Ul, Vi,...,Vl. 

Proposition  8.4.  Given  all  nodes  are  drawn  from  ,  for  any  balanced  cut  (S,S) 
and  its  particular  2KL-history  h  that  we  record  satisfy  the  following:  h  €  Ef  (5,  S). 

Proof  Given  Ej^,  we  know  that  the  joint  event  (E(Zi), . . . ,  E(Z2Ar))  must  happen 
in  the  product  probability  space  Hence  for  all  nodes  Zi, ...  ,Z2Ar, 

diff(Z;)  >  (8.4.34) 

simultaneously  in  the  product  probability  space  (fl,  tr,Pr),  where  diff(Z,)  is  a  ran¬ 
dom  variable  solely  determined  by  node  Z,’s  bits  across  K  loci,  before  or  after  we 
ever  reveal  it. 

In  particular,  for  each  balanced  {S,S),  we  focus  on  the  product  probability 
space  that  is  composed  of  distinct  probability  spaces  defined  over  swapped  nodes 
U\,...,  Ui,V\,. ..  ,14  as  in  Definition  8.16.  After  we  reveal  these  2KL  bits  on  nodes 


Uj,Vj,yj^\,...,L,  by  (8.4.34), 

mUj)  >  Ky,yj  =  l,...,L, 

(8.4.35) 

diff(lO)  >  ^y,Vj  =  1,...,L. 

(8.4.36) 

due  to  (8.4.34).  Thus  we  have  h  E  Ef'(5',5). 

□ 

Definition  8.18.  We  use  f  to  denote  the  future  of  the  2{N  —  L)K  random  unordered 
pairs  that  we  are  going  to  reveal  for  the  unswapped  nodes  on  a  given  balanced  cut 
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(S,S).  Recall  that  once  we  are  fixed  to  the  probability  space  such  that  does  not 
happen,  we  know  that  both  h  and  f  are  confined;  the  following  two  notation  are 
equivalent: 


{he'Ef{S,S))  n  (/e'Ef-'-(5,5)), 

(hj)  e 

Remark  8.4.  Another  way  of  seeing  £['(5',  5)  (with  respect  to  a  particular  bal¬ 
anced  cut  (5,  S) )  is  to  view  it  as  an  event  in  the  simple  probability  space  (H,  jF,  Pr), 
such  that  we  put  constraints  only  on  the  specific  2L  swapped  nodes  defined  on  {S,  S) 
while  leaving  the  f  at  random.  Hence  we  have  ‘E^  C  ‘E[(S,S),  in  (Q,  jr,Pr). 

Thus  h  as  a  2KL-history  on  nodes  drawn  from  ( the  product  probability 

space  (n,  jF,Pr)  excluding  Ef ),  must  satisfy  hE  'Ef(5',5). 

We  leave  this  confined  space  given  E^  for  now  and  explore  the  following  ex¬ 
panded  subspace,  where  we  require  hEE^  while  leaving  the  future  /  at  random. 
{ah,'L{ah),'Prf)  corresponds  to  this  expanded  subspace,  where  h  E  E[. 

Lemma  8.6.  For  a  balanced  cut  {S,  S),  given  a  particular  2KL-history  h  E  F2kl  on 
the  2L  swapped  nodes  such  that  h  E  Ef , 

Eft  [diffiF,  (5,5),L) \h  E  Ef(J  at  random]  >  2L{N  -  L)Ky,  (8.4.37) 

where  expectation  is  over  all  possible  outcome  of  the  2{N  —  L)K  random  unordered 
pairs  in  f  in  probability  space  (flft,r(flft),Prft). 

Proof  For  a  balanced  cut  (5,5),  given  h  E  Ef,  where  h  records  2KL  bits  over 
swapped  nodes  U j,Vj,yj  =  1 , . . . , L,  by  Definition  8.12, 


diff(f/,)  >  Kyyj  =  \,...,L, 
mVj)  >  Kyyj  =  fi...,L. 


(8.4.38) 

(8.4.39) 
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Thus,  in  subspace  {Q.h^'L{Q.h)^Frk),  where  /  is  at  random  and  h  G  we  have 
from  Proposition  8.2, 


E,[diff('r,(5,5),L)]  =  f£''diff(f/,)  +  £ 

7=1  1=1  7=1  (=1 

=  (iV-L)£diff(f/,)  +  (iV-L)£diff(P,) 
;=1  y=i 

>  1L{N-L)K^. 


□ 

Recall  that  Ef'  is  the  event  that  no  simultaneously  large  deviation  happens 
across  2L  individuals  over  their  2KL  unordered  pairs. 

Corollary  8.2.  Given  that  h  E  Ef  fl  'E^,  and  f  is  at  random: 

Eft  [c//yf(‘r ,  (5, S) ,L)\hE  E\  n-E^J  at  random]  >  2L{N  -  L)Ky,  (8.4.40) 
which  holds  so  long  as  hE  Ef. 

We  next  bound  Eft  [diff(‘E,  (5',5'),L)]  for  all  balanced  (5,5),  where  h  is  con¬ 
fined  in  Ej^  and  E^,  before  we  enter  each  individually  expanded  subspace 
(aft,r(nft),Prft). 

Theorem  8.2.  Assume  that  all  nodes  in  our  sample  are  drawn  from  E^,  the  prob¬ 
ability  space  (fl,)F,Pr)  excluding  E^,  we  have  V  balanced  cut  (5,5),  where  h  is 
a  particular  2KL-history  that  corresponds  to  the  2L  swapped  nodes  specified  over 
(5,5)  with  respect  to  E, 

Eft  [d//f('r,(5,5),L)]  >2L{N-L)Ky,  (8.4.41) 

where  the  conditional  expectation  is  over  each  of  the  individually  expanded  prob¬ 
ability  space  (flft,r(flft),Prft),  given  h  E  Ef. 

This  statement  remains  true  after  we  require  that  hEE^in  addition. 
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Proof.  By  Proposition  8.4,  for  each  balanced  cut  (5, 5),  we  have 

h^P{{S,S).  (8.4.42) 

Now  apply  Corollary  8.2,  given  that  h  E  fl  we  immediately  have  the 

theorem.  □ 

Remark  8.5.  diff{Z)  is  determined  by  node  Z  ’s  bit  pattern,  which  is  the  same  when 
we  observe  it  from  every  balanced  cut,  where  it  acts  as  a  swapped  node.  Hence 
although  we  do  have  0{2^)  balanced  cuts,  E/,  (5',.S),L)]  for  all  balanced 

cuts  are  just  determined  by  the  2N  random  variables  diff{Z\)^. . .  ^diff{Z2N),  each 
of  which  is  determined  by  the  genotype  of  an  individual  in  our  sample. 

Hence,  during  the  entire  analysis  of  0{2^^)  balanced  cuts,  we  may  reveal  a 
node  Z  in  many  cuts,  but  every  time  we  reveal  it,  it  is  the  same  node;  and  the  random 
bits  at  each  locus  k,  yk  are  just  random  draws  from  their  corresponding  distribution 
(e. g., Bernoulli  p^)  for  an  individual  from  population  a,  Va  =  1,2),  before  we  start 
to  reveal  them  in  any  cut,  or  after  we  have  revealed  them  many  times. 

After  we  exclude  the  bad  event  from  probability  space  (fl,  jF,Pr),  given 
any  2.^fL-history  h  that  corresponds  to  a  balanced  cut  (5,5),  we  have  an  ad¬ 
vantageous  “base  point”  as  we  enter  an  individually  expanded  probability  space 
(fi^,r(aft),PrA),  where  hEP^n  Pf‘.  This  is  true  for  all  balanced  cuts. 

8.5  The  Bounded  Differences  Approach  in  an  Expanded  Subspace 

In  this  section,  we  bound  the  deviation  of  random  variable  diff(T,  (5,5),L)  for  a 
particular  balanced  cut  (5,5);  recall  that  we  let  vector  {H\^ . . .  ^H2kn)  record  the 
entire  history  of  random  unordered  pairs  that  we  see,  where  (//i, . . .  ,H2kl)  record 
the  2.^fL-history  on  2L  swapped  nodes. 

First  it  is  convenient  to  introduce  some  more  notation:  For  i'  >  2KL,  we 
begin  to  reveal  the  random  unordered  pairs  on  unswapped  nodes  in  (5,5). 
The  random  variable  |^diff(T,  (5,5),L)|^(^)j  depends  on  the  random  ex¬ 
tension  of  h  observed.  By  definition  E^  |^diff(T,  (5,5),L)|^(^^)  (ti)  = 

E^  |^diff(T,  {S,S),L)\H_^^'^  —  if  for  7t  E  Q-h,  where  if  —  H_^^'\ti)‘,  another  notation 
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for  this  is  [diff(T,  (5',5'),L)|lF]  where  ‘J  is  the  a-field  generated  by  )  re- 
strieted  to  To  prove  the  theorem,  we  introduee  the  following. 

Lemma  8.7.  (Azuma’s  Inequality)  Let  Zo,Zi,...,Zm  =  /  he  a  martingale  on 
some  probability  space,  and  suppose  that  |Z,  —  Z,_i  |  <  Cj,  \/i—  1,2, ...  ,m,  then 

Pr[|/-E[/]|>t]<2e-'^/2'^^ 


where  cj. 

Theorem  8.3.  Let  hbe  a  possible  2KL-history  that  we  record  for  a  balanced  cut 
(5, 5)  such  that  h^T^f]  “Ef.  Then,  for  t  >  0,  in  probability  space  {Q.h,'L{D.h),Prh) 
such  that  all  future  2{N  —  L)K  unordered  pairs  in  f  are  completely  at  random, 

Fn[\EH[diff{‘T,{S,S),L)\H^^^] -Eh[diff{‘T,{S,S),L)]\>t]  <2e-^"l^^\ 

where  <  64L^(Af  —  L')K'^-\-  16L(Af  —  L)A,  for  all  balanced  (5,  S)  with  0  <  L  < 
Af/2  swapped  nodes. 

Proof.  We  shall  set  up  things  to  use  Lemma  8.7. 

We  work  in  probability  spaee  (flft,r(Q^),Pr^).  We  start  to  reveal  the  2K{N  — 
L)  unordered  pairs  on  unswapped  nodes  that  are  ehosen  independently  at  random, 
and  rely  on  2L  swapped  nodes  having  a  good  history  4,  given  that  hE  Cl  Ef . 
Given  the  a-field  (flft,r(Q^)),  with  'L{Q.f)  =  2^^,  let  us  first  define  a  fiber  F. 
Given  fhe  independenf  random  unordered  pairs  H2kl+\ ,  •  •  ■  ifiiKN-  The  fiber  is 
defined  by  leffing  ,  V/  =  1 , . . . ,  m,  where  m  =  2K {N  —  L),  be  fhe  a-field  generated 
by  histories  We  fhus  obfain  a  nafural  F: 

{0,^4}  =  iFo  c  E'l  c  . . .  c  =  2^^ 

where  for  0  <  /  <  m  =  2K(N  —  L),  {Llh^  2Fi)  is  a  a-field. 

Henee  F  eorresponds  to  fhe  inereasingly  refined  parfifions  of  obfained  from 
all  fhe  differenl  possible  exfensions  of  fhe  2E'L-history  h. 
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We  obtain  a  martingale  for  random  variable  diff(T, (5, S'jjL)  such  that:  Let 
Z()  =  Eft[diff('r,(5,5),L)]  and 

Ze-2KL  =  E,[diff('r,(5,5),L)|^(^')]  (8.5.43) 

=  E,[diff('r,(5,5),L)|:F^,_2tfJ,  (8.5.44) 

where  Te-iKL  is  the  a-field  generated  by  )  restricted  to  Q.h  and  2KN  >  i'  > 
2KL. 

We  let  H2KL+IT  ■  ■  tH2kn  map  to  random  unordered  pairs  on 
X/, . . .  where  Xj^  or  Yj^  refers  to  an  unordered  pair  of  bits 

on  locus  k  on  individual  X,  or  L,  respectively. 

We  first  define  the  following,  Vj  =  1,2, . . .  ,/n,  where  m  =  2K{N  —  L), 

\Zj-Zj^i\=Cj.  (8.5.45) 

We  also  need  to  translate  between  cj,  where  j  —  1,2,...  ,m,  and  and 

di,k{Yi),  V/  =  1, . . .  ,At  —  L,fc  =  that  correspond  to  unordered  pairs  on  locus 

k  of  Xi  and  Yi  respectively.  In  particular,  V/,  V^,  we  let 

C{i-i)K+k  —  di^kiXi).,  (8.5.46) 

C(N-L+i-i)K+k  =  di^k{Yi).  (8.5.47) 

Let  j  —  2KL  +  (/  —  1  +  ^  —  1,  we  have 

di,k{Xi)  = 

E,[diff('r,(5,5),L)|^W,xf]  -E,[diff('r,(5,5),L)|^W]  . 

And  similarly,  let  I'  =  2KL  +{N  —  L)K  +  (/  —  l).^f  +  —  1,  we  have 

di,k{Yi)  = 

E,[diff('r,(5,5),L)|^(^'),}^.*^]  -E,  [diff('r,(5,5),L)|^(^')]  . 

We  immediately  have  the  following  lemmas  that  we  can  plug  into  Azuma’s 
inequality,  where  di^k  applies  to  both  d^iXi)  and  d^iYi)- 
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Figure  8.5.3.  Set  of  edges  that  random  unordered  pairs  on  Yi  influence  upon 


Lemma  8.8.  For  the  2{N  —  L)K  random  unordered  pairs  on  unswapped  nodes 
Xi^Yj  V/  €  [l,Af  —  L]  that  we  reveal,  at  locus  k  € 


di,k  <  \4L{p2-p\)\  + 


2tks/L 


where  tk  and  A  are  defined  in  Definition  8.14  and  Definition  8.15  respectively,  and 

L  i  <  A- 

k=\ 

Proof.  Assume  before  we  fix  we  have  reached  history  .  W.l.o.g,  assume 
the  pair  of  random  bits  that  we  are  fixing  are  on  Yj,  and  we  let  L  =  00,01/10, 1 1  re¬ 
spectively  and  obtain  the  corresponding  V/,^;  and  we  go  through  the  same 

process  to  obtain  di^k{Xi)fili^k. 

Recall  that  by  Definition  8.14,  j)  -  ihifJ j))  -  “ 

/fi  (Vj)),  where  /|(/j)  is  defined  in  Definition  8.14  and  |E  [/|(/i)]  |  =  \2L{p\  —  p\)  | 
as  shown  in  Proposition  8.3. 
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Thus  by  definition  of  di^k{Yi)  and  di^k{Xi),  we  have 


Mwm 

o 

o 

II 

diAYi)  =  ^ 

\H\\f2ih)\ 

T,^  =  ll 

l  \l-2p\\\f2{h)\ 

=  01/10, 

Xf  =  00 

di,k{Xi)  =  < 

\H\\f2{h)\ 

Af  =  11 

,  \l-2p\\\f^{h)\ 

Af  =  01/10. 

Note  these  ehanges  refleet  seores  at  loeus  k,  whieh  eorrespond  to  the  set  of 
edges  in  diff(T,  whieh  are  adjaeent  to  Fj-  and  X,-  respeetively,  as  in  Fig¬ 

ure  8.5.3.  Henee 


di,k{Yi)  <  2|/|(/i)|, 

(8.5.48) 

di,k{^i)  <  2|/|(/i)|. 

(8.5.49) 

Thus  given  that  hEl^  and  Lemma  8.2,  we  have 

di,k{Yt)  <  2\f^{h)\ 

(8.5.50) 

<  2(  e[/|(/i)]  + 

(8.5.51) 

=  \^UP2- Pl)\+  , 

(8.5.52) 

and  similarly, 

dijc{Xi)  <  |4L(p2“Pi)|+  2tkVL  , 

(8.5.53) 

where  <  A.  □ 

0  A/  7  7^  "T  "T 

We  are  now  ready  to  obtain  a  bound  for  a  =  'Lk=\ ‘^tk’  where  dtk< 

|4L(p|  — pj)|  +  |2-v/L(t/t)|)^  applies  to  unswapped  nodes  Xi,Yi,yi  —  — 

Lyk  =  1 , . . . ,  A'  in  bounding  the  differenees  they  eause  by  revealing  the  unordered 
pairs  on  loeus  k. 
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Given  that  Y.k=i 


a 


2 


< 

< 

< 


£(4(x,)+4(}^-))=2i4 


i,k 

N-L  K 


Lk 


2 1  t{\^^(P2-p\)\+\^^itk)\y 


i=l  k=\ 


2{N  -  L)  £2(4L(/2  -p\)f  +  2{2VL{tk)f 

k 

64L^{N  -  L)  -  P\f  +  16T(Af  - 


64(Af  -  L)L^{Ki)  +  \6{N  -  L)LA, 


where  A  =  32Aln2  +  16.^rin2(loglogA  +  1)  +  61nA  as  in  Definition  8.15.  □ 

We  now  apply  Theorem  8.3  to  obtain  the  following  bound  on  a  bad  event.  Note 
that  the  eonstant  in  the  following  lemma  has  not  been  optimized. 

Lemma  8.9.  Let  h  be  the  specific  2KL-history  that  we  record  for  a  balanced  cut 
{S,  S)  such  that  Let  Pj  =  Then, 

¥r[diff{T',{S,S),L)  <  0|^  e  f]  ,  f  at  random]  <  pf , 

given  that  K  >  and  KN  >  and  A  >  4. 

Proof  By  Theorem  8.3  with  t  =  Eh  [cliff('r,  (5,5),L)]  >  2KL{N  -  L)y, 

Pr[diff('r,(5,5),L)  <  0|^e  -E^'n-Ef] 

=  Pr,  [E,  [diff(E,(5,5),L)|A2^'^]  - E,  [diff(E,  (5,5),L)]  <  -E,  [diff(E,  (5,5),L)] ] 

<  (8554) 

where  is  defined  in  Theorem  8.3. 

In  the  following  ealeulation,  we  assume  A  >  4  at  various  plaees,  but  not  any 
larger.  Given  that  log  log  A  >  1,  VA  >  4,  we  first  rewrite  as  the  following: 

<  64{N-L)L^{Ky)  +  l6{N-L)LA  (8.5.55) 

<  64A(A  -  L)L^y+  5 121n2A(A  -  L)Lloglog  A  +  16(A  -  L)L(32Aln2  +  61nA). 
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We  will  prove  that  for  all  N,  so  long  as 

(1)  k>qC^), 

(2) 

we  will  have 


<  26^-(2^:L(iV-L)y)V2o2  <  ^ 
-  j^4L  ■ 


(8.5.56) 


In  what  follows,  we  show  that  given  different  values  of  N,  by  choosing  slightly 
different  constants  in  (1)  and  (2),  (8.5.56)  is  always  satisfied. 

Case  1:  4  <  <  loglogN /2y. 

In  this  case,  we  require  that  KN  >  ^I'nVbgiogV,  >  1488,  which  im¬ 

mediately  implies  the  following  inequalities  given  that  <  loglogA^/2Y: 

(1)  K>^^, 

O'!  M  <■  ^loglog^ 

■'V  4c|inAr  ^ 

(3)  log  log  A^  >  4y,VA^  >  4,  i.e.,  we  consider  cases  where  Yis  small  enough, 

(4)  InA^  >  21n2,  VA^  >  4. 

We  first  derive  the  following  term  that  appears  in  as  in  (8.5.55): 


16L(A^  -  L){32Nln2  -f  61nA^) 


<  5121n2(A^-L)LA^-h96(A^-L)LlnA^ 

^  1281n2i^(A^-L)LloglogA^  48Yi^(A^-A)L 

“  cilnA^  Cl 

^  64K{N  -  L)LloglogN  \2K{N  -  L)LloglogN 

~  Cl  Cl 

^  76K{N  -  L)LloglogN 

~  Cl 

<  i^(A^-L)LloglogA^, 


given  that  ci  >  1488. 
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Next,  given  that  Ly  <  Afy/2  <  we  have 

<  64^(Af-L)L(Ly)  +  355ii:(Af-L)LloglogAf  +  ii:L(Af-L)loglogAf 

<  16KL{N  -  L)  log  log  Af  +  356KL{N  -  L)  log  log N 

<  311KL{N  -  L)\og\ogN . 


Finally,  given  that  KN  >  we  have: 

AKL{N-L)-f-  LKNy^ 

2x2841oglog/V  c;^  2841oglog/V 

~  2 
-  1^' 


Thus  we  also  have  K  >  2£ihK  =  2976  in  v  gjygj^  <  loglogAf/2y. 

Case 2:  <n< 

In  this  ease,  K  and  N  are  elose  and  we  require  the  following. 


(1)  K  >  where  C2  =  512, 

(2)  KN  >  ^oinwbgiogiv^  ^  2000. 


Note  that  eonstants  co,C2  above  are  not  optimized;  given  any  N,  an  opti¬ 
mal  eombination  of  co,C2  will  result  in  the  lowest  possible  K  given  that  K  > 

fColnWloglogW  C2lnWT 

max| - 


Given  that  N  <  ^^°|Q°g^,  we  have: 


16L(Af  -  L)  (32Af  ln2  -h  61nAf) 


<  — ^(N-L)LloglogAf 

<  20K{N-L)LloglogN, 


and  henee 

<  64ii:(Af-L)L^y-h355^(Af-L)LloglogAf-h20^(Af-L)LloglogAf 

<  64(Af-L)L2ii:y-h375ii:L(Af-L)loglogAf. 
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The  following  inequalities  are  due  to  (1)  and  (2)  respeetively, 


{1KL{N-L)i)^ 
1^6AK{N 
{2KL{N-L)y)^ 
2*315KL{N  -  L)loglogN 


>  \6LlnN, 
16 

>  —LlnN, 

-  3 


and  thus 


{2KL{N-L)yf  ,  {2KL{N-L)yf 
16LlnA^  16LlnA^/3 

{2KL{N-L)yf 
4LlnA^/3  ’ 


(8.5.57) 

(8.5.58) 


(8.5.59) 

(8.5.60) 


T  ,  o  -{2KL(N-L)y)^ 

and  <  2e  2^  <  2^-"^^'"^  <  2/A^4l_ 

Case  3:  >  15. 

Here  we  require  that  K  —  for  some  C3  to  be  determined.  Thus  we  have 
KN  >  whieh  satisfies  the  eonstraint  of  the  form  KN  >  Q(l!l222|l2l2^) 

as  in  other  eases. 

Given  that  77  >  4,  we  have  that  In  77  >  2  In  2  and  henee 


16L(77-L)(32771n2  +  61n77)  <  128(77 -L)L771n77  +  677L(77-L)ln77 

<  134(77 -L)L771nA^. 

Given  that  A' log  log 77  <  20A^,  we  have: 

<  64K{N-L)L^y+5\2ln2*{KloglogN){N-L)L+\34{N-L)LNlnN 

<  64{N  -  L)L^{Ky)  +  5121n2  *  20N{N  -  L)L  +  102(A^  -  L)L771n77 

<  64  l{N  -  L)L{Nj2)  +  {N-  L)LN\nN{  128  *  20  +  134) 

<  (32c3  +  2694)  (77  -  L)L771n77. 
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By  taking  C3  =  188  such  that  C3  >  4(32c3  +2694),  we  have 

{lK{N-L)Lif  _  {lc^{N  -L)L\nNf 


t^ll&  > 


> 


> 


> 


l<d 

2{ct,{N  -  L)L\nNY 
(32c3  +  169A)N{N  -  L)L\nN 
lcl{N-L)L\nN 
(32c3+2694)Af 

>  4LlnAf. 


c\L\nN 


(32C3+2694) 

c^Z.lnA' 

Thus  2e-''/2®"  <  2e“PTT5694j  <  2g-4Liniv  ^  2  _ 

—  — 

In  summary,  we  have  the  following  requirements.  Note  that  N  always  falls  into 
one  of  these  cases.  For  all  cases,  we  require  that  K  >  Q.{\nN ly)  (which  is  implicit 
for  Case  1);  the  constant  that  we  require  in  K  for  Case  2  is  larger  than  that  for 
Case  3,  (i.e.,  C2  >  C3  as  in  above),  so  that  the  two  cases  can  overlap. 


-  Case  1:  16  <  Af  <  log  logAf /2y.  We  require  that  KN  > 
implies  that  K  >  2976  In  N /y. 


14881nA^log  logA^ 


which 


-  Case  2:  <N  <  We  require  that  K  >  and  KN  > 


2y 

20001nVloglogA' 


-  Case  3:  Af  > 


/Tlog  logA^ 


-  20 


.  We  require  K  > 


188  InV 


□ 


In  summary,  instead  of  bounding  the  deviation  of  a  random  variable 
diff(T, (5',5'),L)  in  a  complete  random  space  with  all  nodes  taking  random  un¬ 
ordered  pairs  across  all  loci,  we  first  reveal  pairs  of  bits  on  all  swapped  nodes 
Uj^Vj^yj  G  [1,L]  and  record  A  for  each  (A, A).  We  exclude  two  bad  events  “Ef, 
from  h  for  each  (A, A)  in  the  original  probability  space  (fl,  )r,Pr),  where  all 
2NK  bits  are  at  random.  All  histories  that  we  consider  for  all  balanced  (A,  A)  are 
good  from  this  point  on,  and  we  can  then  apply  bounded  differences  analysis  in 
for  all  balanced  cuts. 
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8.6  Mapping  Back  to  Product  Space  Given 

We  first  prove  one  utility  lemma. 

Lemma  8.10.  For  all  balanced  {S,S),  Pr  [/i  e  €  Ef]  <  hence 

Pr[4€iJ|4€Ef]>l-y-^^. 

Proof.  Given  the  following  equations: 

Vr[h^TJf]  = 

Vr[h  e  ‘Ef^\h  e  £f]  •  Pr [/i  e  £f]  + 
Pr[^e£|'|/ie'Ef]-Pr[/ie'Ef], 

Pr[^e'Ef]  =  (l-^)2i>l-2L/iv32^ 

we  have: 

Pr[^e  'E|'|/ie  -Ef]  = 

Pr[^e  'E|']  -Vr[hE‘E^\h^F{]  •Pr[/ie  Ef] 

Pr[^e  Ef] 

^  Pr[^eEf] 

“  Pr[^eEf] 

<  _ _ 

-  1-2L/A^32- 


Lemma  8.11.  For  a  balanced  cut  (5, 5),  where  its  history  h  is  conditioned  on 

P2  Pf 

1  -  2L/iV32  I- 2{N-L)  /A^32  • 


(8.6.63) 

(8.6.64) 

(8.6.65) 

□ 


(8.6.61) 

(8.6.62) 


Pr[d//f(E,(5,5),L)  <0|Ef]  < 
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Proof.  By  assumption  of  independence  between  node  events, 

Pr  [h  e  Ef  I  Ef  ]  =  Pr  e  -£2^1/1  e  -Ef  n  /  e  Ef 

=  Pr[£G'Ef|/iG'Ef] 

<  _ _ 

-  1-2L/A^32- 

When  h  E  we  give  up  trying  to  bound  diff(T,  {S^S)^L)  <  0;  hence 

Pr[diff('r,(5,5),L) 

<  Pr[£e  EflEf]  + 

Pr[diff('r,(5,5),L)  <0|(£,/)  E  Ef  n/iE  •  Pr  [ft  E  E^^ | Ef  ] 

<  _ Pa _ + _ Pf _ 

-  1-2L/A^32^  1-2(A^-L)/A^32’ 

where  the  last  inequalities  are  due  to  Lemma  8.10  and  Lemma  8.12.  □ 

Lemma  8.12.  For  a  balanced  cut  {S,  S), 

Fr[diff{F,{S,S),L)  <  0|(ft,/)  E  Ef  nft  E  E^"-]  < - 

Proof.  We  use  cq  to  replace  {diff(E,  (5',5'),L)  <  0}  and  bound  the  following: 
Pr  [eo|(ft  E  Ef  n  E^)  H  /  E  Ej^~^] ,  which  is  the  same  as  the  term  in  the  statement 
of  the  lemma, 

Pr  [eo  |Zl  £  n  E^ ,  /  at  random]  = 

Pr[eo|(fte  E|'nEf)n/E  Ef“^]  •Pr[/E  Ef“^|ftE  E^nEf]  + 
Pr[eo|(fte  E|'nEf)n/E  Ef-^j  •Pr[/E  Ef-^|ftE  Ef  nEfj. 

By  independence  between  node  events: 

Pr[/E  Ef-^|ftE  E^^nEf]  =  Pr[/EEf-^],  (8.6.69) 

Pr[/E  Ef-^|ftE  E^^nEf]  =  Pr[/EEf-^].  (8.6.70) 


(8.6.66) 

(8.6.67) 

(8.6.68) 
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Given  that  events  defined  on  2L  swapped  nodes  are  independent  of 

event  on  2{N  —  L)  unswapped  nodes,  we  have  the  following,  where  we  omit 
writing  out  the  /  at  random  eondition. 


< 


< 


Pr  [eo  I  e  n  -Ef )  n  /  e  Ef 

Pr  [eo \he^n  Ef ]  -  Pr  [eo  |  (/i  €  £2^  H  Ef )  H  /  e  Ef  •  Pr  [/  e  Ef 

Pr[/eEf-^] 


Pr[diff(E,(5,5),L)  <0|^G  E^nEf 

Pr[/eEf-^] 

Pf  ^  P3 


(8.6.71) 

(8.6.72) 


where  Pr  [/  €  ^]  >  1  —  following  a  proof  similar  to  that  of  Lemma  8.3. 

□ 


8.7  Putting  Things  Together 

Finally,  we  prove  Theorem  8.1. 

Proof  of  theorem  8.1:  Let  E3  be  the  event  that  any  balaneed  eut  {S,S)  has  a  seore 
higher  than  that  of  perfeet  partition  E, 

Pr[E3]  <  Pr[Ef]  + £Pr[diff(E,(5,5),L)<0|Ef] 

{s,s) 

<  fi_(i_^)2wVy  _ P2 _ ^ _ Pf _ 

-  V  l-2L/iV32  +  l_2(^_L)/iV32 

^32l  y^l_2L/iV32^ 

y  p3 

\lJ\-2{N-L)/N^^ 

=  0(l/poly(iV)), 

where  h  is  the  speeifie  2.^fL-history  that  we  reeord  for  eaeh  (5,  S)  after  we  exelude 
bad  events  Ei,  E2  from  probability  spaee  (fl,  E,Pr)-  O 


9  Learning  Product  Distributions 


9.1  Introduction 

After  exploring  the  power  of  drawing  two  random  veetors  from  its  produet  distribu¬ 
tion,  i.e.,  two  random  draws  from  eaeh  of  the  K  dimensional  distributions,  for  eaeh 
sample  point,  we  ponder  at  the  possibility  of  aehieving  the  same  power  of  eluster- 
ing  using  a  single  random  draw  from  eaeh  of  the  K  dimensional  distributions  for 
eaeh  sample  point.  Let  us  first  formally  define  a  produet  distribution. 

Definition  9.1,  A  product  distribution  D^,  Vm  =  1,2,  over  Boolean  cube  {0, 1}^  is 
characterized  by  its  expected  value  pm  =  {pin  ■  ■  -^Pm)  ^  [0, 1]^.  which  we  refer  to 
as  the  center  o/D 

m- 

We  then  restate  our  problem  as  a  fundamental  problem  of  learning  mixtures 
of  two  produet  distributions  over  diserete  domains,  in  partieular,  over  the  K- 
dimensional  Boolean  eube  {0,1}^,  where  ^  is  a  variable  whose  value  we  need 
to  resolve.  Given  a  small  sample,  i.e.,  when  N  is  small,  ean  we  learn  the  perfeet 
partition  with  a  small  number  of  attributes  from  eaeh  sample  point  sueh  that,  for 
eaeh  attribute,  we  are  given  only  a  single  bit  that  is  randomly  drawn  from  its  eor- 
responding  Bernoulli  distribution? 

We  finish  this  seetion  by  giving  some  more  notation,  followed  by  two  results. 
We  use  X  =  x  =  . . .  ,x^)  to  represent  a  random  ^-bit  veetor,  given  a  set  of 

K  attributes.  Sometimes  we  also  use  x'j  to  represent  the  i‘^  eoordinate  of  point  Xj. 

Definition  9.2.  A  random  vector  x  from  the  distribution  pm,  which  we  denote  as 
X  ~  Pm,  is  generated  by  independently  selecting  each  coordinate  x'  to  be  1  with 
probability  p‘^,yi,ym. 
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We  first  define  an  alternative  measure  for  the  average  distance  between  two 
product  distributions  Di,  D2  across  their  K  dimensions. 

Definition 9.3.  ||Di  - D2II  =  Ml^2h  =  ^ti\p\-P2\_ 

We  prove  in  Section  9.2  Theorem  9.2,  which  holds  under  a  special  condition 
such  that  we  know  whether  p\  >  p'2,  or  vice  versa,  V/.  Under  this  condition.  The¬ 
orem  9.2  states  a  result  that  is  similar  to  Theorem  7.2  (in  Section  7.2),  i.e.,  the 
Global  Optimum  Lemma,  using  only  K  =  0(ln77/a^)  attributes,  where  |a|  >  yis 
the  same  measure  as  in  Definition  9.3.  Similar  to  Theorem  7.2,  we  do  not  require  a 
balanced  input  instance. 

We  next  use  the  inner-product  of  two  .^f-dimensional  vectors  x  and  y  as  the 
Rscore  between  X  and  T,  as  in  Definition  9.4,  and  define  a  complete  graph  where 
nodes  are  sample  points  and  edge  weight  is  the  Rscore  between  the  two  points. 

Definition  9.4.  Rscore{X^Y)  —<x^y>—  j Xy\ 

Similar  to  Section  8.1,  we  define  Rscore  for  a  cut  (S,  S)  as  the  sum  of  Rscores 
over  the  set  of  edges  in  (S,  S).  Let  Pi  represent  the  set  of  points  ,2^2, . . .  from 
a  product  distribution  Di,  and  P2  represent  the  set  of  points  Ti,T2,---,Uv  from 
a  product  distribution  D2.  We  show  that  the  perfect  partition  T  =  {P\^P2)  is  the 
minimum  cut  (min-cut)  in  terms  of  Rscore  among  all  balanced  cut  (S,  S),  both 
in  expectation  and  with  high  probability,  despite  the  deviation  of  an  individual 
Rscore  or  the  sum  of  Rscores  over  a  set  of  edges  from  its  expectation.  Formally, 

Theorem  9.1.  Given  K  —  D(i2p)  attributes  from  each  of  the  2N  sample  points, 
where  N  points  come  from  each  distribution,  and  KN  =  D(  N  >4, 

with  probability  1  —  I / poly{N),  for  all  other  balanced  cut  {S,S)  in  the  complete 
graph  formed  among  2N  sample  points, 

RscoreiT)  <  Rscore{S,S). 

It  is  easy  to  check  that,  following  the  same  line  of  arguments  in  this  chapter 
for  Theorem  9.1,  using  scores  based  on  pairwise  Hamming  distances,  i.e.,  V2f,T, 
p{{x,y)  =  Ya=\ the  max-cut  will  identify  the  perfect  partition  with  high  prob¬ 
ability,  given  the  same  order  of  number  of  attributes. 
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We  also  note  that  this  inner-produet  based  or  Hamming  distanee  based  seores 
ean  not  give  results  that  are  similar  to  the  Global  and  Loeal  Optimum  lemmas  in 
Chapter  7. 

9.2  An  Alternative  Score 

In  this  seetion,  we  prove  Theorem  9.2,  whieh  is  similar  to  Theorem  7.2,  while  re¬ 
quiring  only  a  single  bit  from  eaeh  attribute,  under  the  eondition  that  we  know 
whether  p\  >  p^,  or  viee  versa,  V/.  We  design  a  new  seore,  whieh  we  eall  Bscore 
sueh  that,  with  high  probability,  the  absolute  values  of  Bscores,  eaeh  defined  over 
K  =  Q.{\nN jo?)  attributes,  between  points  from  the  same  distribution  are  eonsis- 
tently  lower  than  those  between  sample  points  from  different  distributions,  where 
a  is  the  same  as  the  measure  in  Definition  9.3. 

To  simplify  presentation,  the  Bscore  we  define  in  fhis  seefion  is  based  on  fhe 
assumpfion  fhaf  p\  >  p\yi.  When  fhis  assumption  is  nof  frue,  we  only  need  fo  alter 
fhe  definition  of  fhe  Bscore  sueh  fhaf  for  fhe  sef  of  affribufes  p\  >  p^,  we  add  + 
signs,  while  for  fhe  ofher  sef  of  aflribufes  we  add  —  signs  before  Bscore'  fhaf  we 
eurrenfly  define. 

Definition  9.5.  For  an  unordered  pair  of  points  (2f ,  T),  let 

Bscore' {X,Y)  =  (v'  (9.2.1) 


Bscore{X ,  T)  =  ^  Bscore'  {X,Y). 

i=i 

Under  fhis  assumpfion,  fhe  a  as  in  Definition  9.6  is  exaefly  fhe  same  measure 
as  in  Definifion  9.3  and  henee  |a|  >y. 

Definition  9.6.  a  =  {\/K)  a',  where  a'  =  p'j  —  p'2. 

Definition  9.7.  For  two  sample  points  X  E  Pa  and  Y  E  Pb,  where  a,b  E  {1^2}, 

K 

a{X,Y)  =  -a{Y,X)  =  (l/^)£(p' -p'). 
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Andhencea^\a{X,Y)  \  =  |a(r,X)|. 

Thus  whether  we  subtraet  Y  from  X,  or  viee  versa,  it  has  a  similar  effeet  in 
terms  of  Theorem  9.2.  We  first  show  the  following  lemma. 

Lemma  9.1.  Let  X,Y  come  from  different  distributions,  and  Zi,Z2  be  of  common 
origin, 


E[Bscore{XJ)]  =  Ka{XJ), 
E[Sscore(Zi,Z2)]  =  0. 

Proof  Given  a(Z,T)  as  in  Definition  9.7, 


K 


E[Bscore(Z,T)]  =  ^E[Bscore'(Z,T)] 

/—I 

(9.2.2) 

II 

1 

(9.2.3) 

=  Y.^Px-Py) 

(9.2.4) 

=  Ka{X,Y). 

(9.2.5) 

Given  all  frequeneies  are  exaetly  the  same  for  Zi,Z2,  it  follows  that 
E[Bscore(Zi,Z2)]  =  0.  □ 

The  following  theorem  does  not  assume  a  balaneed  input  instanee. 

Theorem  9.2.  Let  IN  be  the  size  of  the  sample.  Given  that  K  >  721nA^/a^,  with 
probability  1  —  C?(l/A^^),/or  all  points  Z,T  that  come  from  different  distributions, 
and  for  all  points  Zi,Z2  that  come  from  the  same  distribution, 

\Bscore{X,Y)\  >  2ii:|a|/3, 

|6score(Zi,Z2)|  <  .^f|a|/3. 


Proof  We  first  use  the  Hoeffding  bound  to  prove  the  following  lemma. 
W.l.o.g,  assume  that  X  ^  Pi  and  T  €  f’2  and  henee  a(Z,T)  >  0. 
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Lemma  9.2,  Given  that  K  >  721nA^/a^, 

Pr[|escoA-e(Zi,Z2)|  >i^a/3]  < 

Pr[\Bscore{X,Y)\<2Ka/3]  <  2/N‘^. 

Proof.  Given  E[Bscore(Zi,Z2)]  =  0  and  Bscore(Zi,Z2)  =  '£l=i  Bscore'(Zi,Z2) 
is  the  sum  of  K  independent  random  variables  with  values  in  [—1, 1],  using  Hoeffd- 
ing  bound  as  in  Theorem  7.1,  by  taking  t  =  Ka/3K  =  a/3, 

Pr[|Bscore(Zi,Z2)|  >  .^:a/3]  <  <  2/n\  (9.2.6) 

and  similarly,  given  that  E[Bscore(T,Z)]  =  — E[Bscore(Z,T)]  =  —a,  we  use  one 
of  the  following  depending  on  the  order  of  Z,T  that  we  use  for  subtraetion. 


Pr[Bscore(Z,T)  <  2^a/3]  = 

Pr[-Bscore(Z,T) +E[Bscore(Z,T)]  >.^:a/3]  (9.2.7) 

=  (.92.8) 

<  (9.2.9) 
Pr[Bscore(T,Z)  >  -2ii:a/3]  = 

Pr[Bscore(T,Z)  -  E[Bscore(T,Z)]  >  Kal3]  (9.2.10) 

=  g-2^:"(a/3)V^(2)"  (9.2.11) 

<  I/N"^.  (9.2.12) 

□ 


By  union  bound,  for  a  >  0,  the  probability  that  any  event  of  type 
|Bscore(Z,T)|  <  2Kal3  or  type  |Bscore(Zi,Z2)|  >  Ka/3  happens  is  at  most 
8A^^/A^^,  sinee  the  total  number  of  sueh  events  is  2N{2N  —  1).  Thus  the  theorem 
holds.  □ 

Thus  a  simple  threshold  based  algorithm  will  identify  a  perfeet  partition  by 
taking  that  set  of  edges  that  have  absolute  values  larger  than  a  eertain  threshold 
in  the  eomplete  graph.  In  partieular,  the  perfeet  partition  in  sueh  a  graph  has  a 
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maximum  average  seore,  i.e.,  the  total  seore  across  edges  in  the  perfect  partition 
divided  by  the  number  of  such  edges. 

Remark  9.1.  By  definition  of  y  and  (X,  it  is  easy  to  verify  that  y  >  o?.  Hence  a 
bound  of  K  —  Q.{lnN /y)  is  tighter  than  that  of  K  —  Q.{lnN /o?).  We  give  bounds 
based  on  y  in  Section  7.3  and  Chapter  8,  and  also  in  the  rest  of  this  chapter. 

9.3  Overview  of  Key  Ideas 

We  give  an  overview  for  the  key  ideas  that  we  use  to  prove  Theorem  9. 1 . 

The  Model  and  Notation.  We  use  x  to  denote  a  .k'-bit  vector  that  corresponds  to  X 
in  the  sample,  and  x‘  its  component.  We  use  x  ~  D^fynt  =  1 , 2,  to  represent  that 
X  is  a  vector  that  is  generated  from  the  product  distribution  Dm,  and  thus 

Er~D„H=fc,Vm  =  l,2.  (9.3.13) 

Recall  that  we  build  a  complete  graph  such  that  sample  points  map  to  nodes 
in  the  graph  and  an  edge  between  two  nodes  X  and  Y  is  given  a  weight  of 
Rscore(X,T)  as  in  Definition  9.4. 

The  key  idea  that  makes  an  inner-product  based  score  work  is  that  although 
from  an  individual  sample  point,  e.g.,  T’s  perspective,  diff(T)  (which  is  similar  to 
that  in  Definition  7.6,  except  that  Pscores  there  are  replaced  with  Rscores,  see 
Definition  9.8,)  may  not  be  significant  due  to  the  definition  of  our  Rscore,  the  sum 
of  diffs  over  a  pair  of  swapped  nodes,  e.g.,  diff(2f)  +diff(T)  as  in  Figure  9.3.1,  is 
significant  despite  a  bounded  amount  of  deviation.  Indeed,  we  prevent  the  sum  of 
diff(2f)  +  diff(T)  from  deviating  too  much  from  its  expected  value,  Ky,  by  excluding 
bad  node  events  as  in  Definition  9.10,  which  is  in  essence  similar  to  what  we  define 
in  Definition  8.12,  from  X  and  Y. 

Therefore,  fhe  advanfage  of  a  perfecl  partition  over  all  ofher  balanced  cufs,  i.e., 
fhe  sum  of  diffs  across  2L{N  —  L)  pairs  of  edges  from  swapped  nodes  fo  unswapped 
nodes,  as  shown  in  Figure  8.1.1,  is  significanf  enough,  so  fhaf  fhe  perfecl  partition 
can  almost  always  win  over  all  other  balanced  partitions,  in  terms  of  the  particular 
measure  (minimum  total  score  here),  despite  the  deviation  events  that  we  handle  in 
Section  9.4.2. 
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The  inspiration  for  using  an  inner-product  based  score  and  pairing  up 
diff(X),diff(F)  such  that  X  ~  Di  and  F  ~  D2  comes  from  Freund  and  Mansour 
[1999],  where  they  did  similar  analysis  up  till  Proposition  9.3.  The  rest  of  the  proof 
follows  exactly  that  in  Chapter  8.  And  hence  we  only  rewrite  various  propositions, 
lemmas  and  claims  that  have  changed  slightly  due  to  the  definition  of  this  new 
score. 


Figure  9.3.1.  Given  Dots~  p\  and  Triangles~  p2.  Define  diff(X)  =  E[c|X]  —  E[f7|X] 
and  diff(F)  =  E[r/|F]  —  E[a|F].  Given  K  —  Q.(\nN with  high  probability, 
diff(X)  +diff(F)  >  Xy/2,  given  that  [diff(X)]  +Ej,^^2  [diff(E)]  =  Xy;  Hence 
a  +  b  <c  +  d,  with  high  probability,  given  also  that  KN  —  D(lnAfloglogAf/y^). 


9.3.1  The  Expected  Difference  of  Two  Edges 


Proposition  9.1.  'ia,b=  l,2,Ej,...D„.y~Di  [<  >]  =<  Pa,Pb  >■ 

Proof  We  have  Ma^b  =  1,2, 


[<  >] 


E 


K 

EX 


i=\ 


K  K 

Yj  E  =  L  P'aPb  =<  Pa,Pb  >  ■ 


i=l 


r=l 


□ 

Definition  9.8.  Let  X  be  a  sample  point  from  distribution  Di  and  Y  be  a  sample 
point  from  D2.  Let  X',  Y'  be  points  randomly  drawn  from  Di  and  D2  respectively, 


diff{X)  =  E;,^p^[Rscore{X,X')]-Ep^p^[Rscore{X,Y')], 
diff{Y )  =  [RscoA-e(F,  F')]  -  [Rscore(F,X')] , 
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where  expectations  are  taken  over  all  possible  realizations  ofX',  Y'  respectively. 
Proposition  9.2.  Let  X  be  a  sample  point  from  Di  and  Y  be  a  point  from  D2, 

K  K 

diff{X)  =  diff{Y)  =  Y^yfA-p\)- 

t=l  i=l 


Proof  By  Definition  9.8,  we  have 

diff(X)  =  E^^,jRscore(X,X')] -E^^,^,jRscore(X,y')] 


=  (9.3.14) 

=  <x,^i-^2>  (9.3.15) 

=  Y^x\p\-p2),  (9.3.16) 

i=\ 

diff(y)  =  E^,^,jRscore(y,y')] -Ey,^^jRscore(y,X')] 

=  [<  y,y  >]  -E^^,  [<  y,^  >]  (9.3.17) 

=  <y,P2-pi>  (9.3.18) 

=  (9-3.19) 

i=\ 


□ 


We  next  show  that  the  sum  of  two  expeeted  differenees  over  X  from  D 1  and  Y 
from  D2  is  signifieant. 

Proposition  9.3.  [diff{X)]  +  Ey...,p^  [diff{Y)]  =  \\pi  -P2II2  =  Ky. 

Proof  By  Proposition  9.2, 

K 

E,..;5jdiff(X)]+Ey..p^Jdiff(y)]  =  tp‘M-P2)  +  tp2iP2-p\) 

=  <  Pl,Pl  -P2  >  +  <  P2,P2 -Pl  > 

=  Ky. 


□ 
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Hence  let  us  denote  w.l.o.g.  r]  =  [diff(X)]  >  Ky/2,  and  thus 

Ej;^^Jdiff(X)]=.^:Y-ri. 

We  next  show  the  following  two  claims. 

Lemma  9.3.  Given  that  K  >  Pr^  [diff{X)  >  r]  —  Ky/4]  >  1  —  x. 

Lemma  9.4.  PrY[diff{Y)  >  — rj)  —  .^fy/4]  >  1  —  x. 

Proof  of  Lemma  9.3:  Let  us  define  yu  =  {p\  —  yk  =  Given  that 

X V  •  •  5-^^  are  independent  Bernoulli  random  variables  and  {p\  —  p\)x^  is  either  in 
[O5  ^/%]  or  {~^/%^ 0],  =  1 , . . .  we  apply  Hoeffding  bound  as  in  Theorem  7. 1 

with  t  —  Ky 1 4K  —  y/4: 

Prx  -f^{p\-p\)x'^  +  ^>Ky/4  = 

k=\ 

Frx  f  (p?-p^)x'-Tl<-^y/4 

_k=l 

<  g-2/r2(y/4)VEf=i(v^)" 

<  r. 

Thus  we  have  that  Pr^  [Lf=i  {p\-p’^V>n-Ky/4]>\-T.D 
Proof  of  Lemma  9.4:  Similarly  to  proof  of  Lemma  9.3,  we  have 

Pry  L  {Pi  -  p\)f  -  {Ky-v\)  <  -Kyl4  <  x, 

_k=\ 

where  .^fy  — r]  =  Ey^^^  [diff(T)].  And  hence 

'  K 

Pry  Y.^P2-p\)y'>{Ky-v\)-Kyl4  >l-x. 

_k=\ 

□ 

The  rest  of  the  proof  for  Theorem  9.1  follows  the  proof  for  Theorem  8.1  as 
presented  in  Chapter  8.  Hence  we  only  include  changes  and  important  steps. 
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9.4  Min-Cut  Reveals  the  Perfect  Partition 

9.4.1  Probability  Space 

We  first  introduce  some  notation  regarding  the  simple  probability  space  (Q,  jr,Pr) 
as  follows.  The  set  Q.  is  the  set  of  all  possible  outcomes  for  2NK  random  bits, 
where  we  denote  each  bit  as  b^j  for  a  point  j  at  dimension  k. 

Definition  9.9.  The  elementary  events  in  the  underlying  sample  space  (fl,  jF,Pr) 
are  all  possible  2^^^  choices  ofn  =  2NK  bits.  For  0  <i  <n  and  w  €  {0, 1}',  let 
denote  the  event  that  the  first  i  bits  equal  to  the  bit  string  w.  Let  J-i  be  the  o-field 
generated  by  the  partition  ofQ.  into  blocks  B„,for  w  E  {0, 1}'.  Then  the  sequence 
forms  a  filter.  In  the  o-field  f-u  the  only  valid  events  are  the  ones  that 
depend  on  the  values  of  the  first  i  bits,  and  all  such  events  are  valid  within. 

Remark  9.2.  Comparing  the  above  definition  and  Definition  8.10,  the  only  differ¬ 
ence  lies  in  whether  a  pair  of  bits  or  a  single  bit  defines  the  random  variable  along 
a  single  dimension.  For  all  other  places:  wherever  the  phrase  “a  pair  of  bits”  or 
“pairs  of  bits”  is  used,  it  should  be  replaced  with  “a  single  bit”  or  “bits”. 

We  first  show  that  the  perfect  partition  has  the  minimum  expected  value  among 
all  balanced  cuts,  when  summing  up  scores  over  all  edges  across  the  cut.  We  first 
modify  the  definition  for  a  random  variable  diff(T,  {S,S),L)  as  in  (8.1.5)  to  be: 

diff(T,  (5',5),L)  =  Rscore(5',5)  -  Rscore(T).  (9.4.20) 

Proposition  9.4.  E  [diff{F,  {S,  5)  ,L)]  —  {N  —  L)LKy. 

Proof. 

E[diff('r,(5,5),L)]  =  (fV-L)LEj..p-Jdiff(X)]  + 

{N-L)LEy^pff6\ifiY)]  (9.4.21) 

=  {N-L)LKy.  (9.4.22) 

□ 
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We  work  in  probability  space  (il,  iF,Pr).  For  the  rest  of  this  section,  for  a  bal¬ 
anced  cut  {S,  S),  the  2.^fL-history  h  =  {Ui,...,  Ul,  Vi,...,  Vl}  is  a  random  variable 
whose  value  depends  on  2KL  random  bits  on  2L  swapped  nodes  specified  over 
{S,S)  with  respect  to  T,  as  shown  in  Figure  8.1.1;  recall  that  X  is  the  outcome  of  a 
particular  point  X  in  our  sample. 

We  next  restate  Proposition  8.2  as  the  following. 

Proposition  9.5.  a  random  variable  according  to  Definition  8.11, 

E„[d//f('r,(5,5),L)]  =  {N-L)Y^diff{Uj)  +  {N-L)Y^diff{Vj) 

;=i  i=i 

=  {N-L)l^f^{p\-p\){u)-v% 

j=\k=\ 

where  diff{Uj)  and  diff{Vj)  are  defined  in  Definition  9.8. 

We  keep  Definition  8.13  for  Bad  Event  except  that  we  replace  a  Bad  Node 
Event  ^(Z),  VZ,  with  the  following. 

Definition  9.10.  (Bad  Node  Event)  Let  a  bad  node  event  'E(Z)  be  the  event  that 
diff{Z)  <  E[c//Tf(Z)]  —Ky/4,  where  Z  is  a  sample  point.  Note  this  is  an  event  in  an 
individual  probability  space  {Liz-,  l?z,Prz),  where  {Liz,  IFzi^z)  is  defined  over  all 
possible  outcomes  ofK  random  bits  for  sample  point  Z. 

This  immediately  implies  the  following  lemma,  which  only  slightly  modifies 
Eemma  8.6. 

Lemma  9.5.  For  a  balanced  cut  {S,  S),  given  a  particular  2KL-history  h  G  F2kl  on 
the  2L  swapped  nodes  such  that  h  E  “Ef , 

Eh  ld/ff(‘r,(S,S),L)\h  e  Fi,/ at  random]  >L{N-L)Kyl2,  (9.4.23) 

where  expectation  is  over  all  possible  outcomes  of  the  2{N  —  L)K  random  bits  in  f 
in  probability  space  (Dft,r(fl^),Pr^). 
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Proof.  For  a  balanced  cut  (5, 5),  given  h  G  P,\,  where  h  records  2KL  bits  over 
swapped  nodes  Uj,Vj,yj  =  1,. . .  ,L,  by  Definition  9.10, 

m{Uj)  >  n-Ky/4yj=l,...,L,  (9.4.24) 

diff(F,)  >  Ky-n-Ky/4,yj=l,...,L,  (9.4.25) 

and  hence  diff(f7j)  +diff(F/)  >  Ky/2,yj  —  1,...,L.  Thus,  in  (Dft,r(Dft),Prft), 
where  /  is  at  random  and  h  G  “Ef ,  we  have  from  Proposition  9.5, 

E,[diff('r,(5,5),L)]  =  (iV-L)£diff(f/,)  +  (iV-L)£diff(F,) 

7=1 

>  (iV-L)f  (diff(f/,)+diff(l/,)) 

7  =  1 

>  {N-L)LKy/2. 


□ 


And  thus  we  have  the  following  theorem. 

Theorem  9.3.  Give  that  all  points  are  drawn  from  the  probability  space 
(fl,jF,Pr)  excluding  ,  we  have  V  balanced  cut  (5,5),  where  h  is  a  particu¬ 
lar  2KL-history  corresponding  to  the  2L  swapped  nodes  specified  over  (5, 5)  with 
respect  to  P, 


Eh  [diff{P,  (5,5),L)]  >  {N-L)LKy/2,  (9.4.26) 

where  the  conditional  expectation  is  over  each  of  the  individually  expanded  proba¬ 
bility  space  (D/,,Z(D/,),Pr/,)  given  h  G  P\,  where  P^  is  defined  in  Definition  8.16. 
This  statement  remains  true  after  we  require  that  hE  P^  in  addition,  where  P^  is 
defined  in  Definition  9.13. 

9.4.2  Large  Deviation 

We  aim  to  define  a  Bad  Deviation  Event  as  in  Definition  9.13,  but  we  first  overwrite 
some  definitions  regarding  E^. 


9.4  Min-Cut  Reveals  the  Perfect  Partition 


167 


Definition  9.11.  Given  vectors  i7i , . . . ,  and  vi^...^vl,  where  My,  Vy  are  the  bit 
of  Uj  and  Vj  respectively, 

fHh)  =  =  ^)- 

i=i  y=i 

Definition  9.12.  (Deviation  Values)  'ik=l^...,K,let  t^s/L  be  the  exact  deviation 
on  f^{h),  i.e.,  /|(/r)  -E[/|(^)]  =  ttVlyk. 

Definition  9.13.  (Bad  Deviation  Event  ‘E^)  In  probability  space  (H,  )F,Pr),  given 
a  balanced  cut  (5,  S)  and  its  corresponding  2KL-history  h^  is  the  event  such  that 
the  set  of  random  variables  t\,...fk  regarding  2KL  random  bits  recorded  in  h,  as 
defined  in  Definition  9.12,  are  simultaneously  large  and  satisfy 

K 

^  rf  >  A  =  8Afln2  +  4^1n2(loglogAf  +  1)  +  31nAf/2. 

k=i 

Using  Definition  9.13  and  9.12,  we  immediately  have  the  following  lemma. 
Lemma  9.6.  Given  that  h  e  E^,  we  have  Mk, 

|/|(^)|  <  |e[/|(^)]|  +  |4a/l|, 

and  Lf=i  <  A  where  tk  is  in  Definition  9.12,  and  £2  is  in  Definition  9.13. 

Proof.  By  definition  of  tk^^k,  we  have  that  /|(^)  =  E[/|(/i)]  -Ptks/L,  where  tk  G 

L  vT  ’  vT 

Thus  we  immediately  have 

|/l(^)|  <  |e[/|(^)]|  +  |4v/l|, 

where  t^  <  A,  given  that  h^E^.  □ 

First  let  us  obtain  the  expected  value  of  /|(^),  as  in  Definition  9.1 1. 
Proposition  9.6.  E  [/|(/i)]  =  E  uJ  -  v^l  =  L{p\  -  p\) . 

Next  we  examine  the  deviation  for  each  random  variable  /|(^),  V^. 
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Lemma  9.7,  'ik,for  random  variable  /I  (^)  os  in  Definition  9.11, 


Pr 


/|(^)-E  /l(^) 


>  tkVL 


<  . 


(9.4.27) 


In  addition,  events  corresponding  to  different  dimensions  are  independent. 
Proof.  Let  us  define  random  variables  U^,  sueh  that 


(9.4.28) 


1^  =  'L)=iu)lL&^d  V’^ 

Thus  by  Proposition  9.6, 


where  =  lfj=i  Uj  jL  and  =  lfj=i  v)  1^. 


E 


-E 


=  1^ 


fkh) 


—  p\  ~  Pi- 


Now  applying  Corollary  7.1  of  Theorem  7.1  to  bound  probability  of  deviations 
on  both  sides  of  the  expeeted  differenees,  let  t  —  tks/L/L,  we  have 


Pr 


A{h)-E  fffh) 


>  t/tv/Z 


Pr 


tjk_yk_ 


-E 


)  >  ^v/l/L 


<  le 


<  le~'<^ . 


□ 


Lemma  9.8.  In  probability  space  (fl,  jF,Pr),/or  each  balanced  cut  (5, 5), 

Pr[^e  pjf]  <  P2, 

where  p2  =  0(  and  N  >1. 

Proof.  The  proof  follows  that  of  Lemma  8.4,  exeept  with  the  following  modifiea- 
tions:  (9.4.29)  is  replaeed  with  the  following:  Let  r^ffh  represent  the  number  of 
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such  intervals:  we  have  V^,  so  long  as  Af  >  2, 


rk 


log(  v/Z  +  l{p\-p\)iVl  ) 


<  logl^/Z  <  log2y^Af/2  <  logAf. 


(9.4.29) 

(9.4.30) 


We  also  modify  Lemma  8.5  to  the  following:  whose  proof  follows  a  similar 
line,  except  that  the  definition  of  A  here  follows  that  of  Definition  9.13  due  to  the 
change  of  Lemma  9.7  as  compared  to  Lemma  8.1. 

Lemma  9.9.  Lef  A/4  =  2Afln2-|-A'(ln2)(loglogAf-|- 1)  +  (31nAf) /8  as  A  is  defined 
in  Definition  9.13. 

1 

“  22"-(logAf)^-Af3/2' 

□ 


Pr 


hmaps  to  afixed^{^\. 


RA 


K 

W  r2  \  A  //I 


9.4.3  Bounded  Differences 

We  are  now  ready  to  use  bounded  differences  approach  in  (D^,r(flft),Prft)  and 
prove  the  following  variation  of  Theorem  8.3. 

Theorem  9.4.  Let  hbe  a  possible  2KL-history  that  we  record  for  a  balanced  cut 
(5, 5)  such  that  h^f^Cl  “Ef.  Then,  for  t  >  0,  in  probability  space  r(flft) ,  Pr^), 

where  all  future  2{N  —  L)K  random  bits  f  are  completely  at  random, 

Frh[\EH[diff{T-,{S,S),L)\H^^^]  -  E,  [d//f('r,  (5,5),L)]  |  >  f]  <2e-‘"/^^\ 

where  <  4{N  —  L)L^{Ky)  +  4{N  —  L)LA,  for  all  balanced  cut  {S,S)  with  0  <L< 
N /2  swapped  nodes. 

Proof.  We  should  substitute  all  mentioning  of  “a  pair  of  bits”  with  a  single  bit;  in 
particular,  we  substitute  wherever  they  are  used  in  the  proof  of  Theorem  8.3, 

with  to  refer  to  a  single  bit  at  dimension  k  of  point  X  and  Y  respectively. 
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In  particular,  we  have 


di,k{Xi)  = 

[diff (T ,  (5,  S) , L)  [diff  (T ,  (5, 5) , L)  | .  (9.4.31) 

And  similarly,  let  I'  =  2KL  +{N  —  L)K  +  (/  —  1).^^  +  ^  —  1,  we  have 
di,k{Yi)  = 

E,[diff('r,(5,5),L)|^(^'),yf]  -E,[diff('r,(5,5),L)|^(^')]  . 

We  immediately  have  the  following  lemma  that  we  can  plug  into  Azuma’s  in¬ 
equality,  where  di^k  applies  to  both  di^k{Xi)  and  d^iYi)- 

Lemma  9.10.  For  the  2{N  —  L)K  random  bits  on  unswapped  nodes  Xi^Yi  V/  G 
[1,A  — L]  that  we  reveal,  at  dimension  k  €  [1,^],  we  have 

di,k  <  \J-{P2-p\)\+  tkVl  , 

where  tk  is  defined  in  Definition  9.12  and  A  as  in  Definition  9.13,  and  Y,k=\ ^k  — 

Proof.  This  proof  follows  that  of  Lemma  8.8.  Given  that  T,  ,  V/,  comes  from  D2  and 
A,-,  V/,  comes  from  Di,  and  by  definition  of  di^k{Yi)  and  di^kiXf, 


and 


l-p^|i/|(/l)|  :  y'^  =  h 


\p\\\fH!i)\  ■  4  =  0, 
i-p\\\fHh)\  ■  4  =  1- 
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Hence  given  that  h  G  Lemma  9.6,  and  |E[/|(^)]  |  =  \L{p2  —  p\)\  as  in 
Proposition  9.6, 

di,k{Yi)  <  /l(^)| 

<  e[/|(^)]  +  tkVl 

—  |^(P2“Pi)|+  hVL 

and  similarly,  di^k{Xi)  <  \L{p2  -p\)\  +  \tkVL\,  where  Y.k=i 

We  are  now  ready  to  obtain  a  bound  for  =  '^'T!i=t'^k=\  dfk’  where  df^.  < 
\L{p2  —  p\)  I  +  \\/L{tk)\f'  applies  to  unswapped  nodes  Xi,Yi,yi  =  —  L,  in 

bounding  the  differences  they  cause  by  revealing  the  random  bits  on  dimension  K. 
Given  that  <  A, 

=  i:(4(xo+4(l))  =  2£4 

i,k  i,k 

^  2  £  £  ('|L(p|-pf)|+  v/L(r^)  ] 

(=1  k=i  ^  ' 

<  2(iV-L)£2(L(/2-p?))"  +  2(v/Z(r,))2 

k 

=  AL\N-L)Y^{p\-p\f  +  AL{N-L)Y^tl 

k  k 

<  A{N -L)L^{Ki)+A{N -L)LX, 

where  A  =  8Afln2-|-4.^fln2(loglogAf  +  1)  +  31nAf /2  as  in  Definition  9.13.  □ 

We  now  apply  Theorem  9.4  to  obtain  the  following  bound  on  a  bad  event.  Note 
that  the  constant  in  the  following  lemma  has  not  been  optimized. 

Lemma  9.11.  Let  h  be  the  specific  2KL-history  that  we  record  for  a  balanced  cut 
{S,  S)  such  that  h^'E^f]  Let  p|  =  Then 

Fr  [diff{E,{S,S),L)  <0|^G  E^  H  E^ ,  f  at  random]  <  pf, 

given  that  K  =  and  KN  =  qH  at  >  4. 


(9.4.32) 

(9.4.33) 

(9.4.34) 
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Proof.  We  take  t  —  [diff(T,  (S'jS'jjL)]  >  KL(N  —  L)y/1  and  plug  in  Theo¬ 
rem  9.4,  we  have  the  following: 

Pr[diff('r,(5,5),L)  <0|^e  -Ef  n-Ef] 

=  Frh[Eh[dm‘T,{S,S),L)\H^^^]  -  [diff(E,  (5,5),L)]  <  -E^  [diff(E,  (5,5),L)]  ] 

<  (9.4.35) 

where  is  defined  in  Theorem  8.3. 

We  first  rewrite  a^, 

a2  <  4(a^-l)L^(^Y)+4(/7-L)LA 

=  4{N-L)L^{Ky)  +  {N-L)LA,  (9.4.36) 

which  is  exactly  1/ 16  of  what  we  have  in  proof  of  Lemma  8.9,  as  in  (8.5.55),  where 
A  =  4A.  Given  that  =  {KL{N  —  L)y)^/4  is  also  1/16  of  that  in  (8.5.54),  the  rest 
of  the  calculation  is  exactly  the  same  as  that  in  proof  of  Lemma  8.9,  where  >  4 
is  assumed.  □ 

We  finish  this  chapter  by  noting  that,  to  prove  Theorem  9.1,  we  need  to  bundle 
the  modified  pieces  with  some  original  proofs  in  Chapter  8  according  to  the  outline 
shown  in  Section  8.3. 


10  Conclusions  and  Open  Problems 


10.1  Hierarchical  Routing 

As  our  work  is  motivated  by  routing  and  distributed  data  loeation  applieations  in 
large-seale  distributed  systems,  where  hierarehy  is  introdueed  to  address  sealabil- 
ity  issues,  it  is  important  to  fully  understand  other  issues  sueh  as  robustness  and 
dynamies  posed  by  sueh  applieations,  model  them  appropriately,  and  adapt  our 
algorithms  aeeordingly. 

-  In  partieular,  how  ean  we  adapt  the  randomized  eonstruetions  to  an  online 
setting,  where  nodes  are  allowed  to  join  and  leave  the  system  dynamieally, 
while  still  guaranteeing  the  bound  on  path  streteh  and  the  “optimality”  of  the 
deeompositions? 

-  We  also  laek  a  eomplete  understanding  of  the  relationships  between  routing 
that  optimizes  path  streteh  versus  routing  that  optimizes  eongestion.  How 
ean  we  effeetively  eompromise  these  two  goals? 

10.2  EDP  and  Congestion  Minimization 

In  EDPwC,  the  goal  is  to  eonneet  as  many  terminal  pairs  as  possible  subjeet  to 
the  eonstraint  that  at  most  co  demands  ean  be  routed  through  any  edge  in  the 
graph.  Note  that  when  co  =  0(logn/loglogn),  we  get  a  eonstant  approximation  via 
randomized  rounding  Raghavan  and  Thompson  [1987].  For  an  undireeted  graph, 
the  strongest  hardness  of  approximation  bounds  are  Q(log2“^n)  for  EDP  and 
D(log^  n)  for  EDPwC  due  to  Andrews  et  al.  [2005]. 
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(1)  Is  it  possible  to  improve  the  hardness  of  approximation  for  undireeted  EDP 
to,  e.g.,  ^(log^^*^?!)? 

(2)  Is  there  a  better  approximation  for  All  or  Nothing  Multieommodity  Flow 
(ANF)  problem  sueh  that  the  approximation  ratio  is  O(logn)? 

(3)  Is  it  possible  to  extend  our  algorithm  to  obtain  a  polylogarithmie  approxi¬ 
mation  for  undireeted  EDP  in  general  graphs,  in  partieular,  when  we  allow 
eongestion  (O  to  grow  from  1  to  perhaps  log  log  n? 

10.3  Classification 

In  the  eontext  of  this  population  elassifieation  problem,  there  are  many  open  ques¬ 
tions  that  one  needs  to  eonsider  to  eome  up  with  rigorous  proofs: 

(1)  How  to  extend  this  to  biased  eases,  where  two  populations  have  different 
sizes?  Currently,  the  max-eut  theorem  only  works  for  balaneed  eases  and  the 
proof  teehniques  strongly  rely  on  the  faet  that  we  eompare  only  all  balaneed 
euts  with  the  perfeet  partition. 

(2)  How  to  extend  this  analysis  to  multiple  populations? 

(3)  How  to  allow  admixture  model,  where  eaeh  individual  does  not  eome  from 
the  same  population  of  origin,  instead,  eaeh  individual’s  genotype  ean  be  a 
mixture  of  several  distributions? 

(4)  How  to  analyze  it  when  different  loei  are  allowed  to  have  eorrelations?  Our 
eurrent  model  assumes  independenee  between  different  loei,  and  our  pro¬ 
gram  draws  samples  from  the  genotype  databases  randomly  to  simulate  this 
independenee. 

Answering  these  questions  will  not  only  shed  lights  on  our  understanding  of  the 
underlying  mathematieal  strueture  of  DNA,  but  also  have  signitieanee  in  defining 
and  answering  problems  in  this  the  elassie  domain  of  learning  distributions. 
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